Demystifying GPT-3: “It’s impressive but it doesn’t know the world as we do"

There has been quite a hype around GPT-3 in the AI world. Is it worth it? Our MT Engineer Ahmad Taie says: yes and no.

Demystifying GPT-3: “It’s impressive but it doesn’t know the world as we do"
Ahmad Taie
Feb 26, 2021

There has been quite a hype around GPT-3 in the AI world. The latest version of the “Generative Pre-Trained Transformer” created by research business OpenAI is often described as revolutionary. We’ve asked our MT Engineer and Applied Research Lead Ahmad Taie whether the technology is worth all the fuss or not.

Ahmad, why are people so excited about GPT-3?

It’s a language model that can produce quite coherent and plausible text from the prompts you give it. The results sound very natural and it’s quite hard to distinguish them from a text written by a person. This means that GPT-3 has managed to form a pretty useful model of human language, which requires a lot less context than the models we’ve seen before. That alone is impressive and worthy of the public’s attention. Getting natural language right has always been quite tricky because it’s unpredictable in its nature compared to more controlled environments like chess for example. Another reason for all the excitement is the question, how technology like this might affect humankind and society. As it develops further and keeps getting better, some people - including its creators - see a potential threat in the misuse of it given how easy it could be used to generate text to spread disinformation, for example. This is one of the reasons OpenAI has decided not to release it to the public.

New developments in AI more than often turn into discussions about how technology might replace humans, like journalists for example, because GPT-3 might just write articles. What do you think about that?

GPT-3 generates impressive output. However, when you’re interacting with the model you will also realize that it doesn’t have the same understanding of the world as we do. It’s mainly trying to generate coherent text and it doesn’t always make sense. On top of that, the model lacks intent, so you still need a human to trigger it by telling it what to generate.

What does GPT-3 do and how does it work?

Size - meaning a huge number of parameters - is a crucial part of this model’s success. It’s also trained on a large amount of text data with the quite simple objective of predicting the next word when given a sentence. This is also how people can use it after it was trained. Given a snippet of text as an input it generates a completion of that text. It’s like a very clever auto-complete if you will. So to program it you provide text examples of the pattern you would like the tool to generate. From this, you can basically get it to produce anything that has a language structure: a paragraph of text, a summary, answers to questions, or even computer code. If you can formulate it as a text task, GPT-3 can take a shot at it.

What do you find fascinating about GPT-3?

I think it's fascinating how a machine learning model can become very good just by this simple task of predicting the next word, done at scale, both in terms of model and dataset size. Of course, it still takes a lot of effort to find the proper recipes to successfully train these models. But it's still very impressive what the creators of GPT-3 have achieved with the tools that exist today. I’m very excited to see what will be possible in the future as the technology matures.

What does the progressing development of GPT-3 mean to the field of Neural Machine Translation?

GPT-3 is one of the largest artificial neural network models currently out there. What it shows is that scaling up models and data can lead to more sample efficient models - despite being a lot more computationally intensive. This learning is important for us at Lengoo since we deal with low resource domain data - our customers’ data - every day. While our NMT-results are very good already, there’s always room to aim higher.

GPT-3 still has some hick-ups, right?

Most deep learning models like this tend to fail, well, not very gracefully. If you spend some time playing around with it, you can figure out how to throw a curveball at it with something creative or an out-of-the-ordinary way of writing. GPT-3 and similar machine learning models still show a lot of brittleness, because they are trained to always generate something. So when you hit a probability distribution, which the model isn’t well trained on, it can generate plain nonsense. Reducing these issues is a very active research area in AI.

So GPT-3 produces human-like sounding text. But is it really high-quality text?

I think the question that we have to answer first is how do you determine whether a text is high-quality? It’s a very subjective matter and we haven’t found a way yet to measure this automatically and effectively. Think of English exams, for example, and how different teachers might evaluate and grade them differently. Since language is used to communicate with humans, it’s still best judged by the receivers of the language. If we could find a very good metric to measure “high-quality text”, we would be able to quickly develop models that could generate exactly that. But until then, I think the main thing we can look for is how useful this text can be and what kind of value it offers compared to the text produced by humans.

Some critics claim that the tool fosters discrimination and promotes an unbalanced view of the world. Do you agree?

Well, I’d say the model itself isn’t discriminating because it doesn’t know good or bad. It just sees data. You can tell the system that some data is more important than the other - as in more likely - which can help reduce bias. But if the goal is to make it perform well in the real world, it’s still important to actually show it data from the real world. To make the system more robust you need to expose it to pieces of text that are not all “good”. This helps with brittleness issues. After all, it mainly depends on the task at hand. GPT-3 is just a tool. Tools are not a problem by themselves, the problems come from how people use it. A writer can make use of a model that generates evil text as a creative tool to help with writing the parts about a villain in their novel. However, it might not be the best idea to use the same model as a customer support tool for a bank.