Attention is all you need
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Summary
The document presents the Transformer, a novel neural network architecture that relies solely on attention mechanisms for sequence transduction tasks, eliminating the need for recurrence and convolutions. The authors demonstrate the effectiveness of the Transformer through experiments on machine translation tasks, achieving state-of-the-art results on both English-to-German and English-to-French translations, while also highlighting its ability to generalize to other tasks like English constituency parsing.
Keywords
Transformer
attention mechanisms
machine translation
BLEU score
self-attention
Main claims
The Transformer model is based solely on attention mechanisms, eliminating the need for recurrence and convolutions.
The Transformer achieves superior translation quality and faster training times compared to existing models.
The model establishes new state-of-the-art BLEU scores on both English-to-German and English-to-French translation tasks.