AI tools make things up a lot, and that's a huge problem

2023-08-30 02:49

Artificial intelligence-powered tools like ChatGPT have mesmerized us with their ability to produce authoritative, human-sounding responses to seemingly any prompt. But as more people turn to this buzzy technology for things like homework help, workplace research, or health inquiries, one of its biggest pitfalls is becoming increasingly apparent: AI models sometimes just make things up.

AI tools make things up a lot, and that's a huge problem

Before artificial intelligence can take over the world, it has to solve one problem. The bots are hallucinating.

AI-powered tools like ChatGPT have mesmerized us with their ability to produce authoritative, human-sounding responses to seemingly any prompt. But as more people turn to this buzzy technology for things like homework help, workplace research, or health inquiries, one of its biggest pitfalls is becoming increasingly apparent: AI models often just make things up.

Researchers have come to refer to this tendency of AI models to spew inaccurate information as "hallucinations," or even "confabulations," as Meta's AI chief said in a tweet. Some social media users, meanwhile, simply blast chatbots as "pathological liars."

But all of these descriptors stem from our all-too-human tendency to anthropomorphize the actions of machines, according to Suresh Venkatasubramanian, a professor at Brown University who helped co-author the White House's Blueprint for an AI Bill of Rights.

The reality, Venkatasubramanian said, is that large language models — the technology underpinning AI tools like ChatGPT — are simply trained to "produce a plausible sounding answer" to user prompts. "So, in that sense, any plausible-sounding answer, whether it's accurate or factual or made up or not, is a reasonable answer, and that's what it produces," he said. "There is no knowledge of truth there."

The AI researcher said that a better behavioral analogy than hallucinating or lying, which carries connotations of something being wrong or having ill-intent, would be comparing these computer outputs to the way his young son would tell stories at age four. "You only have to say, 'And then what happened?' and he would just continue producing more stories," Venkatasubramanian said. "And he would just go on and on."

Companies behind AI chatbots have put some guardrails in place that aim to prevent the worst of these hallucinations. But despite the global hype around generative AI, many in the field remain torn about whether or not chatbot hallucinations are even a solvable problem

What is an AI hallucination?

Simply put, a hallucination refers to when an AI model "starts to make up stuff — stuff that is not in-line with reality," according to Jevin West, a professor at the University of Washington and co-founder of its Center for an Informed Public.

"But it does it with pure confidence," West added, "and it does it with the same confidence that it would if you asked a very simple question like, 'What's the capital of the United States?'"

This means that it can be hard for users to discern what's true or not if they're asking a chatbot something they don't already know the answer to, West said.

A number of high-profile hallucinations from AI tools have already made headlines. When Google first unveiled a demo of Bard, its highly anticipated competitor to ChatGPT, the tool very publicly came up with a wrong answer in response to a question about new discoveries made by the James Webb Space Telescope. (A Google spokesperson at the time told CNN that the incident "highlights the importance of a rigorous testing process," and said the company was working to "make sure Bard's responses meet a high bar for quality, safety and groundedness in real-world information.")

A veteran New York lawyer also landed in hot water when he used ChatGPT for legal research, and submitted a brief that included six "bogus" cases that the chatbot appears to have simply made up. News outlet CNET was also forced to issue corrections after an article generated by an AI tool ended up giving wildly inaccurate personal finance advice when it was asked to explain how compound interest works.

Cracking down on AI hallucinations, however, could limit AI tools' ability to help people with more creative endeavors — like users that are asking ChatGPT to write poetry or song lyrics.

But there are risks stemming from hallucinations when people are turning to this technology to look for answers that could impact their health, their voting behavior, and other potentially sensitive topics, West told CNN.

Venkatasubramanian added that at present, relying on these tools for any task where you need factual or reliable information that you cannot immediately verify yourself could be problematic. And there are other potential harms lurking as this technology spreads, he said, like companies using AI tools to summarize candidates' qualifications and decide who should move ahead to the next round of a job interview.

Venkatasubramanian said that ultimately, he thinks these tools "shouldn't be used in places where people are going to be materially impacted. At least not yet."

Can hallucinations be prevented?

How to prevent or fix AI hallucinations is a "point of active research," Venkatasubramanian said, but at present is very complicated.

Large language models are trained on gargantuan datasets, and there are multiple stages that go into how an AI model is trained to generate a response to a user prompt — some of that process being automatic, and some of the process influenced by human intervention.

"These models are so complex, and so intricate," Venkatasubramanian said, but because of this, "they're also very fragile." This means that very small changes in inputs can have "changes in the output that are quite dramatic."

"And that's just the nature of the beast, if something is that sensitive and that complicated, that comes along with it," he added. "Which means trying to identify the ways in which things can go awry is very hard, because there's so many small things that can go wrong."

West, of the University of Washington, echoed his sentiments, saying, "The problem is, we can't reverse-engineer hallucinations coming from these chatbots."

"It might just an intrinsic characteristic of these things that will always be there," West said.

Google's Bard and OpenAI's ChatGPT both attempt to be transparent with users from the get-go that the tools may produce inaccurate responses. And the companies have expressed that they're working on solutions.

Earlier this year, Google CEO Sundar Pichai said in an interview with CBS' "60 Minutes" that "no one in the field has yet solved the hallucination problems," and "all models have this as an issue." On whether it was a solvable problem, Pichai said, "It's a matter of intense debate. I think we'll make progress."

And Sam Altman, CEO of ChatGPT-maker OpenAI, made a tech prediction by saying he thinks it will take a year-and-a-half or two years to "get the hallucination problem to a much, much better place," during remarks in June at India's Indraprastha Institute of Information Technology, Delhi. "There is a balance between creativity and perfect accuracy," he added. "And the model will need to learn when you want one or the other."

In response to a follow-up question on using ChatGPT for research, however, the chief executive quipped: "I probably trust the answers that come out of ChatGPT the least of anybody on Earth."