AlphaGo taught itself how to win, but without humans it would have run out of time

THE GUARDIAN

text

AlphaGo, the board-game-playing AI from Google’s DeepMind subsidiary, is one of the most famous examples of deep learning – machine learning using neural networks – to date. So it may be surprising to learn that some of the code that led to the machine’s victory was created by good old-fashioned humans.

The software, whichbeat Korean Go Champion Lee Sedol 4–1 in March,taught itself to play the ancient Asian gameby running millions of simulations against itself.

AlphaGo is one of two neural networks, taught by a mixture of supervised learning (studying previous games played by humans) and reinforcement learning (playing against itself and learning from its mistakes). But some things, it turns out, just can’t be taught.

According to Thore Graepel, research lead at DeepMind, AlphaGo’s finished system was very good at working out what areas of the board to focus its thinking on, but not so good at working out when to stop thinking and actually play a move.

That’s a problem, because most competitive Go matches use a complex timing system: in the match played against Lee, for example, each player had a total of two hours to make all their moves, and three minute-long refreshing buffers, called “byo-yomi”, that they could play into once the two hours were up. Don’t use the whole of one byo-yomi, and you can use it again next turn. Run it out, and you lose it forever. Run out all three, and you lose on time.

“There’s this meta-game that’s being played,” explained Graepel. “Humans do quite sophisticated time management. They think much longer about difficult situations, and then play more reactively and quicker in other situations, and we tried to do this a little bit as well.

“Time is an important resource: the longer we can think about a move, the better the move will generally be, but there’s limited time. So we had some methods in place where, if we knew that by thinking more the algorithm wouldn’t change its decision any more, no matter what came out of that additional thinking time it wouldn’t change any more, we can determine that.”

Rather than building the timing rules in to AlphaGo’s understanding of the game, however, the team instead bolted it on as an extra constraint. And unlike the core engine, the timing algorithm was ultimately generated by hand.

It was still perfected algorithmically, though. “We optimised it through our evaluation system,’ Graepel said. “So we had different curves that we were comparing. You know, use less time at the beginning and more time and more later, or use more time at the beginning and less later … we tested which one played the best.”

So don’t worry too much about machines taking your job. There’s always something for you to do – even if that’s just manning the stopwatch.