Large Language Models are Great Tools but Lousy Researchers

Photo Credit: Getty

Climate skeptic Newtuber Tony Heller asked ChatGPT some questions about the climate and found it lacking. It gave the sort of answers you would expect from an uncritical source, saying that the sky is falling because of climate change. Unfortunately, the answers it gave contradict the actual data, as Tony demonstrates. However, he shouldn’t have been surprised.

Anyone using a Large Language Model (LLM) for research purposes is going to be given a lot of duff data like this. I asked Bard, Google’s LLM, for Cicero’s definition of justice as used in De Re Publica (hereafter, the Republic) and it just made it up, using a phrase that sounds Ciceronian but was actually pulled, as far as I could tell, from an unrelated essay on justice from 2006. When I pressed Bard to give me a source it told me it was the Republic, Book 3, Chapter 22. The trouble is that Chapters 20-22 of Book III of the Republic are lost. I asked it for the Latin, and it made that up as well.

Bard’s explanation for why it might do this is salutary (the second point is probably the most pertinent):

“There are a few reasons why large language models might make up quotations, references, and citations.

First, they are trained on a massive dataset of text and code, which includes a lot of information that is not always accurate. This can lead to them regurgitating information that they have learned, even if it is not true.

Second, they are designed to generate text that is similar to human-written text, and this can sometimes lead them to make up quotations, references, and citations in order to make their output more convincing.

Finally, they are still under development, and they are not perfect. This means that they sometimes make mistakes, and this can include making up quotations, references, and citations.

It is important to be aware of these limitations when using large language models, and to always check the accuracy of the information that they generate.”

This is why I am trying to stop using the phrase AI in relation to these LLMs. By their own admission, they don’t understand what they are putting out, which as far as my (and I think the generally understood) definition goes means they aren’t intelligent. They are models of how we use language, and very good ones, but they don’t think. That makes them (perhaps worse than) useless for critical research, and very good for lazy research, such as churning out bog-standard articles about how the world is warming.

This is why I am unconcerned that LLMs are going to achieve the human-like Artificial General Intelligence (AGI) that people like the Future of Life Institute are worried about. I was glad to see that Sam Altman of OpenAI thinks similarly. His conversation on this topic with Lex Fridman is worth a look:

LLMs are going to be a huge boon to us in terms of making a lot of things easier, as the Microsoft demo of Copilot suggests. However, one of the things they aren’t yet good at is critical research. If one of my researchers started making up references or presented conclusions at odds with the actual data, I’d fire them.

We need to be aware of what these amazing new tools are capable of, and what they are bad at, and adjust our expectations accordingly, without assuming malice or fearing SkyNet.