Computers and animals can learn in a similar way

Chess and Go computers can learn on their own to play well using a process called reinforcement learning. The mechanism is similar to how animals seem to learn, for example, when an animal tries to find a strategy that maximizes pleasure and minimizes pain or hunger. This behavior has been studied in humans and other animals for more than a century.

What do a child learning to ride a bike and Google trying a new shade of blue for the font its search page have in common? In fact, the similarities are so fundamental that they are easy to overlook: An active agent is trying to achieve a goal: the kid is trying to move forward without falling, while Google is after extra engagement from users. Both interact with their environment through trial and error, learning from previous attempts: "If I go too fast, I might fall," or "People click more on this shade."
The trial-and-error search and the (often delayed) reward make all these situations examples of reinforcement learning. In computer science, reinforcement learning is one of the three basic paradigms of machine learning, along with supervised and unsupervised learning. Here’s how it works: In its basic form, a decision maker can take different actions to change its current situation; the child might choose (mostly unconsciously) to lean a little to the left or to speed up. For each action, there is feedback, maybe a sudden fall or feeling a bit more stable on the bike. This feedback points the way to an overall improvement.

Of dogs, cats – and chess computers

For a computer or robot, of course, feedback is more abstract: it receives a numerical reward signal. This idea of implementing trial-and-error learning in a computer goes back almost as far as the earliest concrete thoughts about how to create artificial intelligence: in the mid-20th century, Alan Turing described a design for a “pleasure-pain system” that would reward a machine's successful actions and punish its unsuccessful ones. If this sounds reminiscent of Ivan Pavlov’s (1849-1936) description of conditioned reflexes in dogs or of Edward Thorndike’s (1874-1949) experiments on the learning process in cats, it is no coincidence: the concept of reinforcement learning is indeed inspired by early behaviorism.
In recent decades, reinforcement learning has achieved remarkable successes: programs trained with this paradigm can play highly complex games like Go or Chess very well. It is also used in autonomous driving and in teaching robots human-like motor skills, such as running or flipping pancakes.

Understanding the algorithms of our brains

The server room at the MPI for Biological Cybernetics is used, among other things, for research into the algorithms of the brain.

Jörg Abendroth/MPI for Biological Cybernetics

The server room at the MPI for Biological Cybernetics is used, among other things, for research into the algorithms of the brain.

Jörg Abendroth/MPI for Biological Cybernetics

But there is more: the precise formalism of reinforcement learning has also proven useful for understanding animal and human behavior. Neuroscientists gather mounting evidence that the human brain employs mechanisms that are strikingly similar to reinforcement learning algorithms. This can help understand how neurotransmitters work: dopamine, for instance, does not simply reward the brain with good feelings, as is sometimes suggested in the popular media. Instead, it seems to be involved in giving different areas of the brain an update on what reward to expect. This hypothesis was formulated by Peter Dayan and others using the mathematical rigor of reinforcement learning.
Moreover, researchers in Peter Dayan’s Department of Computational Neuroscience use reinforcement learning to analyze how animals and humans predict and control their environment, learn, and make decisions: How much do we trust our own judgments? How do we decide which movie to watch? Why do we sometimes procrastinate on chores and tasks? How do we make decisions in the face of uncertainty? What makes us creative? How can all this go wrong in neurological and psychiatric disease? These are only a few examples; reinforcement learning has proven useful for understanding an enormous range of behaviors.