Skip to main content

The Infinity Machine

An exclusive extract from the new book ‘The Infinity Machine’, Sebastian Mallaby’s landmark account of Google DeepMind.

A graphic illustration in sci-fi style. A human hand appears from the top right, a computer generated hand appears from the bottom left, and they touch index fingers. The image takes its reference from the famous Michelangelo fresco painting the Creation of Adam, a, c. 1512, on the Sistine Chapel ceiling in Vatican City.


AT THE TIME of the ImageNet breakthrough, DeepMind was courting an AI scientist named Vlad Mnih, a Ukrainian‑born Canadian who would be key to the company’s ambitions. Mnih was another ­soft‑spoken Hinton protégé: He had a handsome, brooding presence, like a moody hero in a Russian novel. But although he was a Hintonite, Mnih was less tribal than many of his colleagues.

For the most part, Hinton’s ­deep‑learning group in Toronto barely communicated with the premier center for reinforcement learning at the University of Alberta, where David Silver did his PhD. Mnih was an exception. After taking undergraduate classes from Hinton, he had completed a master’s degree in Alberta before returning to Toronto for his doctorate. Steeped in the teachings of both deep learners and reinforcement learners, Mnih wanted to blend the two techniques. He regarded the failure to combine them as a huge missed opportunity.

In Toronto, Hinton would say of deep neural networks, “This is how the brain works.” In Alberta, Richard Sutton, the luminary of reinforcement learning, would say of RL agents, “This is how the mind works.” The two professors had similar ambitions, and each had discovered a promising approach. “When you hear things like that, it’s like, ‘Why aren’t you guys working together?’ ” Mnih said.2

“The reward signals in reinforcement learning resembled the dopamine signals in the human brain. If the brain was the template for artificial general intelligence, RL would be indispensable to building it.”

There were reasons for the Toronto–Alberta division. The reinforcement learners loved developing mathematical proofs showing that their systems worked in theory, even if they were difficult to build in practice. The deep learners were the opposite: They loved building systems that worked in practice, even if there was no elegant theory to explain them. A deep neural network was a mysterious black box: Impressive when measured by its outward results, opaque when it came to its internal functioning.

One day when Mnih was a graduate student in Alberta, he asked his supervisor, a reinforcement‑learning theoretician named Csaba Szepesvári, why he did not take advantage of the advances in neural networks. Hinton had recently published his landmark 2006 paper on deep belief networks. Surely, Mnih urged Szepesvári, this was exciting stuff. For anybody working on AI, new opportunities beckoned.

“I know that in practice neural nets work,” Szepesvári confessed. “I just can’t prove anything about them, so I don’t use them.”4

Mnih might have been tempted to dismiss the reinforcement learners as a blinkered bunch, except that they were plainly on to something. Deep learning took you only so far: It could recognize patterns and make sense of data, but it could not create agents that interacted with their environments. This not only limited what deep learning could do; it limited what it could learn, since much human learning occurs through trial and error. By dropping an object, a child learns about gravity. By saying “please” and getting what she wants, she learns the value of good man- ners. Reinforcement learning equips machines to do the same: to act, and to learn by acting.

Relative to deep learning, with its mind‑boggling nonlinearities and impressive practical results, reinforcement learning seemed theoretical and primitive. But to Mnih and other believers in RL, the promise of agents that could learn from experience remained thrilling. Whereas Deep learning depended on the availability of training ­data — human-labeled cat photos, for example —reinforcement learning held out the hope that an AI could collect its own data by acting in the world and observing the consequences of its actions. In principle, there was no limit to the scope of such actions. An RL system could learn anything.

In the summer of 2012, Mnih presented his PhD findings at an AI conference in Scotland. The conference was dominated by sober projects like his; futuristic schemes to build artificial general intelligence were nowhere on the agenda. But at the reception the first evening, the tone suddenly shifted. Two conference participants showed up at the party and announced that they were building AGI. They had a start-up in London. They were looking for recruits who believed in the mission.

Mnih’s first thought was that this pair sounded crazy. Hinton had warned his students to steer clear of overexcitable, Singularity types, with their goofy, let’s-build-Terminators mindset. This AGI duo, who introduced themselves as Shane Legg and Daan Wierstra, appeared to fit that profile.6 But, as it happened, Mnih had an older brother called Andriy, who was also an AI scientist. Andriy had done a stint as a post‑doc at University College London, where he had met Shane Legg. Now he assured Vlad that these AGI promoters were not as sketchy as they seemed. They talked AGI, but they were also real scientists.

The younger Mnih agreed to have coffee with Legg and the Terminator‑building Wierstra.

“If you’re entrenched in academic computer science, you’re going to be thinking about the next practical step. But if you come at it from neuroscience, you understand the endpoint of intelligence.”

Vlad Mnih

What do you want to work on, the Deep-Minders asked him? Mnih gave the answer that usually got him a disdainful look: “I want to try combining neural nets with reinforcement learning.”

“That’s what we’re doing!” the pair answered delightedly. In Switzerland, where Legg and Wierstra had both done their PhDs, combining the two approaches was actually encouraged. Besides, neuroscience strongly suggested that reinforcement learning would be a necessary complement to deep learning. After all, the reward signals in reinforcement learning resembled the dopamine signals in the human brain. If the brain was the template for artificial general intelligence, RL would be indispensable to building it.7

A few weeks later, Mnih was invited to a video interview with Hassabis. In advance of the conversation, Hassabis sent over a link to his Wikipedia page. Reading it, Mnih discovered that Hassabis had been a chess prodigy, a superstar video game designer, and the ­five‑time winner of the Mind Sports Olympiad. Now he felt intimidated.

Mnih dialed into the video call, unsure what to expect. Almost immediately, he was captivated. For one thing, Hassabis was surprisingly approachable. “I remember being struck by how humble he was and how easy it was to connect,” Mnih said later. For another, Mnih found Hassabis’s neuroscience perspective refreshing. “If you’re entrenched in academic computer science, you’re going to be thinking about the next practical step. But if you come at it from neuroscience, you understand the endpoint of intelligence.”9 Hassabis had clearly been passionate about building AI since childhood. “Pretty amazing person,” Mnih thought to himself.10

A graphic of a brightly lit orb intersecting a cube, representing the two colliding worlds of deep learning and reinforcement learning

Mnih also recognized in Hassabis that contagious conviction, a quality he had learned to appreciate during his time with the deep learners in Toronto. Precisely because there was no theoretical proof that neural nets would work, it mattered enormously that charismatic lab mates like Hinton and Sutskever insisted that they absolutely would work: confidence substituted for theory. “It’s this thing, you have to believe,” Mnih reflected.

I recalled a line from the Life Story movie, which had inspired Hassabis to apply to Cambridge. “I’m one of the believers,” says Watson, the codiscoverer of DNA. “Blessed are they who believed before there was any evidence.”

Mnih fully expected that DeepMind would go the way of most ­start‑ups and soon be out of business. But by the end of the video call with Hassabis, he had decided to join anyway.

MNIH PACKED UP his life in Toronto and moved to London in May 2013, joining a steadily expanding research team at DeepMind’s new office on Bernard Street. The day he began, David Silver also became a ­full‑time employee, overcoming his inhibitions after finding that the hours he spent with kindred spirits at DeepMind were more rewarding than the ones spent at his university laboratory.11 Silver had by now established himself as an authority on reinforcement learning, but the other newcomers at Bernard Street demonstrated DeepMind’s commitment to intellectual diversity. 12

Two recruits worked on statistical methods for quantifying uncertainty and incorporating probabilities into models.13 Two had worked on deep learning at New York University under Yann LeCun.14 Others, including Wierstra, were focused on the intersection between artificial intelligence and human intelligence. A computational psychologist named Chris Summerfield had signed on, working alongside Hassabis’s coauthor, Dharshan Kumaran, in DeepMind’s fledgling neuroscience unit. For decades, disparate computer scientists, statisticians, psychologists, neuroscientists, physicists, and biologists had experimented with AI: The field was so balkanized that it barely existed. Now, at last, DeepMind was unifying it.

'The Infinity Machine' book jacket showing a blurred portrait of Demis Hassabis
Extracted from THE INFINITY MACHINE by SEBASTIAN MALLABY, published by Allen Lane 31/03/2026 Copyright © Sebastiaan Mallaby 2026

Footnotes

2. Vlad Mnih, author interview, October 6, 2023
4. In 2006, no less a futurist than Douglas Hofstadter, author of Gödel, Escher, Bach, had called out the Singularitarians for blending reasonable predictions with utterly wild ­stuff — utility foglets that could assemble themselves instantly into any object on earth, civilizations that commandeered the entire galaxy to do their information processing. See /r/ 21dotco, “Trying to Muse Rationally About the Singularity Scenario,” Medium, January 1, 2016, medium.com/@emergingtechnology/trying‑to‑muse‑rationally‑about‑the‑singularity‑scenario‑9c9db2eb9ece; “Douglas Hofstadter at Singularity Summit,” Vimeo, 2016, 34 min., 19 sec., vimeo.com/showcase/ 1777581/ video/ 33633966
7. Shane Legg recalls that groups like Jürgen Schmidhuber’s in Switzerland bridged RL and deep learning. Wierstra adds that he had combined deep learning and RL for his PhD. Meanwhile, a positive attitude toward reinforcement learning also fitted Hassabis’s neuroscience perspective. Mnih, author interview; Shane Legg, author interview, November 22, 2023; Daan Wierstra, author interview, December 5, 2023
9. Mnih, author interview.
10. Mnih, author interview.
11. Silver, email to the author, November 22, 2024. Separately, Shane Legg recalls, “I had to convince Dave to join DeepMind. He was not comfortable about the prospect, given his experiences at Elixir. One of the ways we reassured him was that when he first came here, he reported to me.” Shane Legg, author interview, February 22, 2024.
12. Before joining DeepMind, Silver held a Royal Society University Research Fellowship, one of the most prestigious honors available to an ­ early‑career British scientist.
13. Joel Veness and Shakir Mohamed came from the so- called Bayesian tradition in machine learning. Joel Veness, author interview, January 23, 2024.
14. Yann LeCun supervised the PhD of Koray Kavukcuoglu and was a coauthor with Karol Gregor.

About the Author: Sebastian Mallaby is the author of several books, including 'The Power Law' and the New York Times bestseller 'More Money Than God'. He is a former Financial Times contributing editor and a two-time Pulitzer Prize finalist.