Data-Mining the World Champions

March 1, 2015

Today I’m going to show you the two most fascinating chess graphs I’ve ever seen. Of course, they’re also the only chess graphs I’ve ever seen, but still I found them utterly remarkable.

A statistician named Rob Weir did a principal component analysis of the opening repertoire of all 20 world champions (including Khalifman, Kasimdzhanov and Ponomariov, who I think shouldn’t be included, but whatever). What does this mean? Well, first it means that he tabulated the frequency of each of the 500 ECO opening codes in those grandmasters’ games, giving each player a vector in a 500-dimensional space. (We could call this the “ECO code vector” of that player.)

Five hundred is a lot of dimensions. You’d like to have a simpler way of categorizing each person’s style. Principal component analysis is the first tool of choice for data miners who want to reduce the complexity (number of dimensions) of a data set.

For example, when Netflix ran a competition a few years ago to find the best method for predicting what movies a customer would like, data miners found that the two most important predictors were: Is it a chick flick or a guy flick? and Is it serious or escapist? These categories “explained the variation” in customers’ tastes the best.

It’s very important to realize that the computer knows nothing about movies. It merely computes a number from all of its information about the movie: this movie scores 0.8 on factor 1 and -0.2 on factor 2. It has no idea what the factors mean. But when you put all the movies on a graph, it becomes instantly obvious to a human movie expert what the real-world meaning of factors 1 and 2 are. On the left you see all the chick flicks, on the right you see all the guy flicks, on the top you see the serious flicks, and on the bottom you see the escapist flicks. One of the great triumphs of principal component analysis, as a data mining tool, is that so often the most important factors do have a real-world meaning that a human expert can readily identify.

So Weir performed the same kind of analysis on the openings of all the World Champions, pulling out the two most significant factors. First, here is a graph of some common openings, showing how they score on the two factors.


The first thing we notice is that on the right we have all the “open” openings and on the left we have the “closed” ones. Thus, all by itself, without knowing anything about chess, the computer has identified that the most important characteristic of a world champion’s repertoire is whether he prefers open games or closed games. This factor explains 30.79 percent of the variation; or to put it another way, it reduces the uncertainty by 30.79 percent. If I tell you that a world champion preferred open games, it gives you that much more information about what his “ECO code vector” looks like.

The second factor, on the vertical axis, reduces the uncertainty by 11.27 percent. But the meaning of this one is much more elusive. Weir writes in his blog:

I’m having a harder time reading a real-world meaning into the second component.  Maybe a reader sees something here?

Well yes, in fact, this reader does! But the meaning only becomes apparent when we graph the world champions rather than the openings.


Do you notice something about this graph? It’s something that at first I thought was just a coincidence, but now I’m convinced that it’s real, and the computer has found a second meaningful descriptor of a world champion’s opening repertoire.

That descriptor is the era he played in. Just look at the champions in chronological order! We start with Steinitz and Lasker in the lower right. And then we gradually move counterclockwise: Capablanca and Alekhine come next, and then Euwe. By the time we get to the next world champion, Botvinnik, we are for the first time in the negative territory on the “open-closed” scale. And we keep moving counterclockwise, all the way through Anand and Carlsen, the two most recent champions.

However, there is one world champion who is an exception to this story: Bobby Fischer. If Anand and Carlsen have a 2010′s-era repertoire, it seems as if Fischer took his repertoire from the 2050′s. He is the only champion who departed completely from the opening repertoire of his era.

Once again, I remind you: the computer knew nothing about chess when it identified factor 1 (open/closed) and it knew nothing about chess history when it identified factor 2 (time). From the data alone, it has pulled out what it considers the two most significant pieces of information, which together account for 42 percent of the variance in a world champion’s repertoire. If you tell me his open/closed preference and the era in which he played, you have given me 42 percent of the available information about his ECO code vector. Unless his name is Bobby Fischer.

Why was Bobby Fischer so different (or as statisticians say, an outlier)? I think that this gets to the real message of the second graph. It’s telling us that opening styles evolve. Alekhine and Capablanca evolved from Steinitz and Lasker. The Soviet champions from Botvinnik through Kasparov also evolved, each one changing a little bit from his predecessor. This evolution always goes more or less counterclockwise. Why does it always go in the same direction? I have a hypothesis about that, too. A new generation of players always wants to differentiate itself from its chess “parents.” But it does not want to do that by doing what its chess “grandparents” did. So that means it will always move in the opposite direction from the grandparents. This perpetuates the counterclockwise opening cycle.

Fischer was the only world champion who did not evolve from his predecessors. He was sui generis, Latin for “one of a kind.” In his radical preference for 1. e4 as White, he was like a nineteenth-century player. In his preference for the Sicilian as Black, he was like a twenty-first century player. As a result, his point on the graph is kind of halfway between the two.

I was quite surprised to see that Anand and Carlsen are more Fischer-like than their predecessors, but the graph suggests that this trend will continue to the next champion after Carlsen. Maybe by the year 2050, we’ll once again have a truly Fischer-like world champion!

More French Cooking

February 28, 2015

My last post on the French Defense, a few days ago, attracted more comments than any post I’ve written for at least a couple years. So let’s continue the conversation.

One of the commenters (Brian Wall, I’m looking at you) asked, “What doesn’t beat the French?” Maybe White can do just about anything!

In the spirit of doing just about anything, I started playing around last year with 1. e4 e6 2. c4 d5 3. ed ed 4. cd (diagram).

french5Position after 4. cd. Black to move.

FEN: rnbqkbnr/ppp2ppp/8/3P4/8/8/PP1P1PPP/RNBQKBNR b KQkq – 0 4

First, does anybody know if this variation has a name?

I  knew, of course, that White sometimes plays this way against the Caro-Kann, but I had never seen it in the French. Here are some of the reasons I felt that it was worth trying:

  1. Psychology. If there’s one thing a French player can count on, it’s having a pawn on d5. Not in this variation!
  2. Psychology. For my whole chess career, I’ve been afraid of playing isolated queen pawn positions. Here I’m committing myself to such a position on move four. Face your fears!
  3. Psychology. I’m a player who likes open lines. This variation should lead to positions with plenty of open lines.

Unfortunately, you can’t choose moves based on psychology alone. There also has to be some chess thought behind them. Like Mike Splane in my previous entry, I found what I think is a model game in this variation, the game Joel Benjamin – Paul van der Sterren from Munich 1994. I loved this game so much that I recorded a ChessLecture about it, called (if I remember correctly) “When Not to Think.”

The thing I loved about this game was that Benjamin played 20 obvious moves, none of which needed any tactical calculation at all. The first time he had to calculate anything was on move 21, when he played a piece sacrifice that won the game. Let’s see:

Benjamin – Van der Sterren

1. c4 e6 2. e4 d5 3. ed ed 4. cd Nf6 5. Bb5+ Nbd7 6. Nc3 Be7 7. d4 O-O 8. Nf3 Nb6 9. O-O Nbxd5 10. Re1 c6 11. Bc4 Be6 12. Bb3 h6 13. Ne5 Re8 14. Qf3 Bf8 15. Bd4 Nb4 16. Ne4 Nbd5 17. Ng3 Nd7 18. Rad1 Qc7 19. Nh5 f6 20. Ng6 Bd6?

french4Position after 20. … Bd6. White to move.

FEN: r3r1k1/ppqn2p1/2pbbpNp/3n3N/3P4/1B3Q2/PP1B1PPP/3RR1K1 w – - 0 21

Black’s last move was a mistake that gives White a winning combination. But even before 20. … Bb6, White was just about winning! I put the game on Rybka, and it rates the position as +1 pawn for White. And the only Black move that even does that well is 20. … Bf7, trying to trade off material. White’s best move is then 21. Qf5!, with the brutal plan of Nf4, Bc2, and Qh7+. Anyone would be happy with White’s attack here.

If Black had played, say, 20. … N7b6 instead of 20. … Bb6, White would have had a different sacrifice: 21. Bxh6! Black’s kingside pawns are exceptionally poorly placed: not only do they leave the light squares undefended, they are also (every one of them) targets for White’s pieces.

Is that enough of a hint for you to figure out White’s winning move? Benjamin played 21. Nxg7! ripping open Black’s pawn formation. Van der Sterren thrashed a little bit with 21. … Kxg7 22. Qh5 Bxh2+ 23. Kf1! Bf4 24. Nxf4 Nxf4 25. Bxf4 Qxf4 26. Bxe6, but even though material is even White is winning. Benjamin lifted a rook to g3 and van der Sterren had to give up his queen.

For those of you who’ve seen my ChessLecture, sorry about repeating myself, but I think it’s worth looking at the game a second time. White’s development was so natural. Bishops out, rooks to center, knight to e5, queen to f3, other knight to the kingside. And that isolated queen pawn that I was worried about? Not a factor.

I have played this variation twice now in tournament games, and I still haven’t reached a verdict. One was a win and one was a draw, but the results are a bit deceptive. In the win I thought I came out of the opening with little to no advantage, while in the draw I had a huge advantage and botched it. Neither player played van der Sterren’s 5. … Nbd7; one played 5. … Bd7 and the other opted to take a move earlier on d5 with 4. … Qxd5.

With the Two Knights variation and the Exchange variation and the Double Exchange variation (that’s what I’ll call this unless somebody can tell me a better name), it seems indeed that White has plenty of interesting offbeat methods against the French, and we haven’t even gotten to the main line of 2. d4 and 3. Nc3.

P.S. Some of you might have noticed that I have not recorded any ChessLectures yet this year. It wasn’t intentional at first, but I think I’m going to take a hiatus for a while. I’m expecting a very busy period of work in the next few months, and I don’t know if I can afford the time for the ChessLectures and the blog. But I’m not making any official announcements, because I’d like to be able to step right back into doing ChessLectures if I have the time and the creative muse to do it.

Playing All 50 Openings

February 26, 2015

About 50 years ago, the Yugoslav chess magazine Chess Informant introduced a new classification system for chess openings. For chess players it was like the invention of the metric system: it systematized the nomenclature that varied wildly from country to country. (For instance, your Spanish Opening is my Ruy Lopez.) Now all the openings and […]

The Eleventh Commandment

February 23, 2015

Yesterday I met for a short chess session with Gjon Feinstein, Mike Splane, and Eric Montany. Mike showed us a game he played at the Kolty Chess Club last week that features a new variation he is exploring in the French. He played it against an expert named Lev (I don’t remember the last name, […]

Three-Peat at the USAT West!

February 17, 2015

Last weekend there were two big chess events in California, and I didn’t go to either of them. Nevertheless, Facebook kept me abreast of some of the things happening in both tournaments, and I have some big landmarks to report. The U.S. Amateur Team Championship West was held in the southern part of the state, […]

1. a3

February 16, 2015

What do you think? Good? Bad? Ridiculous? It seems to me that with the move 1. a3, White is saying to his opponent, “I will agree to play Black, and I believe that in any opening you might choose to play, I will be able to find a variation in which a3 is a useful […]

February 11, 2015

Last week I finished reading a military-history book called Saratoga: Turning Point of America’s Revolutionary War. I am totally not a military history buff, but recently it bothered me to realize that I do not know a single battle of the American Revolution, other than Bunker Hill, which we lost. “How can you win a […]

You Will Be Assimilated

February 4, 2015

Sorry I’ve been away for a couple days… I went to an undisclosed top-secret location, where they’ve just had 19 inches of snow, and began the process of turning into a Borg. See? Resistance is futile. Anyway, when I got home I read in my Facebook feed that Mike Zaloznyy just got his Life Master […]

Introducing The Hive Mind

January 21, 2015

This Sunday I had the chance to meet with Gjon Feinstein, Mike Splane, and Eric Montany at a coffeehouse to go over my recent game with Ivan Ke. (See this post for some earlier discussion of the game.) We tore apart and dissected the game until there were only the smallest bones left, and I […]

Dinosaurs Roam the Earth in Dublin

January 20, 2015

I did not go to the Golden State Open in Dublin, California, last weekend. I have a lot of work this month and couldn’t prepare properly for a chess tournament. But I wish I had! It looks as if it was a great tournament, although not quite as loaded with strong players as the New Year Championship was. […]

