Modern Hopfield Networks
We propose a new paradigm for deep learning by equipping each layer of a deep learning architecture with modern Hopfield networks. The new paradigm is a new powerful concept comprising functionalities like pooling, memory, and attention for each layer. Associative memories date back to the 1960/70s and became popular through Hopfield Networks in 1982. Recently, we saw a renaissance of Hopfield Networks, the modern Hopfield Networks, with a tremendously increased storage capacity and an extremely fast convergence. We generalize modern Hopfield Networks with exponential storage capacity to continuous patterns. Their update rule ensures global convergence to local energy minima and they converge in one update step with exponentially low error. Surprisingly, the transformer attention mechanism is equal to the update rule of our new modern Hopfield Network with continuous states. The new modern Hopfield network can be integrated into deep learning architectures as layers to allow the storage of and access to raw input data, intermediate results, or learned prototypes. These Hopfield layers enable new ways of deep learning, beyond fully-connected, convolutional, or recurrent networks, and provide pooling, memory, association, and attention mechanisms. We demonstrate the broad applicability of the Hopfield layers across various domains. Hopfield layers improved state-of-the-art on three out of four considered multiple instance learning problems as well as on immune repertoire classification with several hundreds of thousands of instances. On the UCI benchmark collections of small classification tasks, where deep learning methods typically struggle, Hopfield layers yielded a new state-of-the-art when compared to different machine learning methods. Finally, Hopfield layers achieved state-of-the-art on two drug design datasets.
arXiv: “Hopfield Networks is All You Need”, https://arxiv.org/abs/2008.02217
- “Modern Hopfield Networks” (S. Hochreiter) https://youtu.be/bsdPZJKOlQs
- “Hopfield Networks in 2021” (S. Hochreiter & D. Krotov) https://youtu.be/k3YmWrK6wxo
- “Performers & Memory” (S. Hochreiter & K. Choromanski) https://youtu.be/BuDhitKZ_YY
Sepp Hochreiter is heading the Institute for Machine Learning, the LIT AI Lab and the AUDI.JKU deep learning center at the Johannes Kepler University of Linz and is director of the Institute of Advanced Research in Artificial Intelligence (IARAI). He is regarded as a pioneer of Deep Learning as he discovered the fundamental deep learning problem: deep neural networks are hard to train, because they suffer from the now famous problem of vanishing or exploding gradients. He is best known for inventing the long short-term memory (LSTM) in his diploma thesis 1991 which was later published in 1997. LSTMs have emerged into the best-performing techniques in speech and language processing and are used in Google’s Android, in Apple’s iOS, Google’s translate, Amazon’s Alexa, and Facebook’s translation. Currently, Sepp Hochreiter is advancing the theoretical foundation of Deep Learning, investigates new algorithms for deep learning, and reinforcement learning. His current research projects include Deep Learning for climate change, smart cities, drug design, for text and language analysis, for vision, and in particular for autonomous driving.