Attention is all you need

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Summary

The document presents the Transformer, a novel neural network architecture that relies solely on attention mechanisms for sequence transduction tasks, eliminating the need for recurrence and convolutions. The authors demonstrate the effectiveness of the Transformer through experiments on machine translation tasks, achieving state-of-the-art results on both English-to-German and English-to-French translations, while also highlighting its ability to generalize to other tasks like English constituency parsing.

Keywords

  • Transformer

  • attention mechanisms

  • machine translation

  • BLEU score

  • self-attention

Main claims

  • The Transformer model is based solely on attention mechanisms, eliminating the need for recurrence and convolutions.

  • The Transformer achieves superior translation quality and faster training times compared to existing models.

  • The model establishes new state-of-the-art BLEU scores on both English-to-German and English-to-French translation tasks.

Thesify enhances academic writing with detailed, constructive feedback, helping students and academics refine skills and improve their work.
Subscribe to our newsletter

Ⓒ Copyright 2025. All rights reserved.

Follow Us:
Thesify enhances academic writing with detailed, constructive feedback, helping students and academics refine skills and improve their work.
Subscribe to our newsletter

Ⓒ Copyright 2025. All rights reserved.

Follow Us:
Thesify enhances academic writing with detailed, constructive feedback, helping students and academics refine skills and improve their work.

Ⓒ Copyright 2025. All rights reserved.

Follow Us:
Subscribe to our newsletter