Language Models are Few-Shot Learners

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners.

Summary

The document presents GPT-3, a 175 billion parameter autoregressive language model that demonstrates significant improvements in few-shot learning across various NLP tasks. It highlights the model's capability to perform tasks without specific fine-tuning, achieving competitive results compared to traditional fine-tuned models. The paper discusses the methodology of training, evaluation, and the diverse tasks tested, including translation, question answering, and reasoning tasks. Additionally, it addresses the societal impacts and ethical considerations of deploying such powerful language models.

Keywords

GPT-3
few-shot learning
language models
natural language processing

Main claims

GPT-3 demonstrates substantial improvements in few-shot performance compared to previous models.
Scaling up language models enhances their ability to perform tasks without task-specific fine-tuning.
GPT-3 can generate coherent text that is often indistinguishable from human-written content.
Despite its strengths, GPT-3 still struggles with certain datasets and tasks, particularly those requiring complex reasoning.

Download Paper

Review with Thesify ->

The Lagrangian and symplectic structures of the Kuramoto oscillator model ›