Suppose Chess and Go are actually easy, given some absolute game scale that goes from easy to hard.

The reason we think that they are hard (and fun), and why they have endured for so much of history, is that they fit how our human brains do pattern recognition and reasoning. Both games use a 2-D board with pieces placed on a grid, and can thus be viewed as an image. We look at one image (a game state), and learn what the next image should be. The ideal move doesn’t depend on the prior moves in the game, so everything is nicely contained—you only need a single game state to determine what move to make next. Other things that make the game easier are that there are only two players, there isn’t any hidden information (both players can see the whole board), and the moves are discrete (a piece can’t be partially on a position).

It is interesting that for games like Chess and Go, where the game is a board with pieces on it that look like pixels, the game state can be treated as an image. Vision is a primary sense for humans, and based on what we know about how the brain works and how vision is performed in the neocortex, the brain uses a generalized repeating algorithm for pattern recognition over a time series of visual inputs. Senses such as vision, sound, and touch all involve a series of sensory inputs and are processed in roughly the same way by the neocortex. Games that are interesting to humans will fit into this general kind of pattern recognition, and become fun when that capability is pushed to its limits. Pushing the limits of what our brains can do is what makes games fun and hard. But, in a fundamental sense, it can also make them easy for a machine learning system to solve.

Google DeepMind’s system AlphaZero can learn to play superhuman Go and Chess in just hours. It does this by treating both games as a computer vision problem. For computer vision technology today, the state of the art is a deep residual network (ResNet), a particular structure of neural network that for some problems (such as facial recognition) can outperform the human visual system. AlphaZero uses a ResNet to process a game state (it literally takes as input a visual representation of the game board). The output of the ResNet is the move to make. Training the network is quite involved, since the system starts with no knowledge about how to play the games well and must thus learn by playing itself. The game search spaces are enormous, so many many trials need to be done. DeepMind uses parallel computing hardware (available only to Google), and wrote the AlphaZero software to learn using this hardware. This collection of hardware can do roughly 140 quadrillion calculations per second, so it was a significant engineering feat to pull off AlphaZero. However, the resulting game playing system is pretty straightforward. A standard ResNet takes as input a game state presented as an image, and it outputs a move to make.

(I’ve glossed over a number of important details here, particularly something called Monte Carlo Tree Search (MCTS), which uses the ResNet to search ahead in the game to better determine the right move to make. But, the point remains, since MCTS is being guided by a computer vision solution.)

I also don’t want to belittle the achievement that AlphaZero is—it’s the AI achievement of a lifetime (or at least a year, given the pace of innovation in machine learning today). In doing AI research, we need to solve easier problems before tackling harder ones, and problems are always hard until you know how to solve them, when they then become easy.

It’s going to be very interesting to see which hard problems become easy next!