Why can't chatbots always give accurate information?

The Chronify

October 8, 2025

ChatGPT and other artificial intelligence tools often provide answers that are completely made-up or false. In the world of AI, this problem is known as “hallucination.” But why do chatbots hallucinate? Why don’t they simply admit when they don’t know something?

Recently, a research paper by OpenAI has identified the root cause of this problem.
But at the same time, it has revealed a disturbing truth one that suggests this issue may never be fully solvable, at least not for everyday users.

The paper, backed by strong mathematical reasoning, shows that there is no connection between an AI model's confidence and the accuracy of its responses. In other words, these mistakes aren't accidents they are an inevitable part of the model’s architecture.

Until now, it was believed that such errors might occur due to flaws in the training data.
But the researchers have proven that even if you train the model with perfect data, the problem will still persist.

The root cause lies in how language models work. Chatbots don’t truly “know” things instead, they predict the next word in a sentence based solely on probability.

For example, when generating the sentence “I eat rice,” the chatbot first considers the word “I,” then adds “eat,” and finally guesses what word should come next in this case, “rice.” Each word is chosen based on a statistical guess of what is most likely to follow the previous one. The entire process is based on prediction, not understanding and that’s where things can go wrong.

When an AI says “I don’t know,” it gets a score of zero. But if it gives a completely wrong answer, it also gets zero. So, from the model’s perspective, guessing is always better than admitting ignorance.

Let’s say you ask the AI three different questions. Researchers have found that when trying to generate a full sentence, the model makes twice as many errors as it does when answering a simple yes/no question. Why? Because with each additional word, there's more room for error. Simply put, the rate of hallucination depends on how well the AI can distinguish between correct and incorrect responses.

But in many cases, making that distinction is extremely difficult which makes hallucination almost inevitable.

The study also found that the less frequently an AI sees a piece of information during training, the more likely it is to hallucinate when asked about it later.

For instance, in an experiment involving the birthdays of famous people, if a person’s birthday appeared only once in the training data, the AI gave a wrong answer at least 20% of the time.

When researchers asked advanced models about the birthday of Adam Kalai (one of the paper's authors), a single model gave three completely different and incorrect dates in three separate attempts.

Why Can’t AI Just Say “I Don’t Know”?

The big question now is: why does this problem persist even after training? Usually, before an AI is released to the public, it goes through a phase of refinement using human feedback. But according to the research paper, this is where the core problem lies.

After analyzing evaluation systems used by major AI developers like Google, OpenAI, and others, the researchers found that 9 out of 10 major AI benchmarks are designed in such a way that if an AI expresses uncertainty or says “I don’t know,” it receives zero points.

This system of giving zero points for uncertainty is at the heart of the hallucination problem. Let’s break it down further.

Mathematically, it has been shown that if AI systems were rewarded or encouraged to admit uncertainty, they would naturally guess less and hallucinate less. But right now, saying “I don’t know” gets the model a zero. And giving a completely wrong answer? Also zero.

So what’s the best strategy for the AI? Always guess.

Imagine you’re asked three questions. If you say “I don’t know,” you get zero. If you give a wrong answer, same result: zero. But if you guess and get it right, you score points. Naturally, you’ll be tempted to guess every time.

This is exactly what AI models do. Rather than admit uncertainty, they make a confident guess, hoping it turns out to be right.

The researchers proved mathematically that in these types of scoring systems, the expected score for guessing is always higher than the score for not answering, no matter how low the chances of being right.

The Solution That Could Break Everything

OpenAI has proposed a potential solution: before responding, the AI should evaluate its own confidence level, and scoring or evaluation systems should reward or penalize it accordingly.

For example, the AI could be told:
“Only answer if you're at least 75% confident. Because giving a wrong answer will cost you 3 points, but a correct one will earn you just 1 point.”

Mathematically, this system encourages the AI to express uncertainty instead of guessing, which would significantly reduce hallucination.

But here’s the catch: this could drastically affect user experience.

Imagine asking ChatGPT ten questions and it replies to three of them with, “I don’t know.”
Would you still want to use it?

For users who are accustomed to getting confident answers to anything they ask, this kind of system might feel frustrating. Many might quickly stop using such a chatbot, thinking:
“Why ask ChatGPT when I can just Google it?”

The Real Obstacle: Economics

While user dissatisfaction with “I don’t know” responses could potentially be managed, a much bigger challenge lies in the economics not human economics, but computational economics.

Creating an AI that is honest and self-aware one that knows what it doesn’t know and admits it would require vastly more computing power and time compared to current systems. And if such a system were to answer millions of questions per day, the cost would run into billions of dollars.

If the AI replies “I don’t know” whenever it's unsure, and you don’t get the answer you were looking for, you might start to wonder: what’s the point of spending so much money on it?

This kind of sophisticated system may be justifiable in specialized fields like chip design or medical diagnosis, where a single wrong answer could cost millions of dollars. In such cases, the extra cost is worth it.

But for general users especially those who want instant answers at low or no cost this economic model simply doesn't work.

In reality, it’s practically impossible to build such an AI for the general public. Today’s AI tools already cost between $2,500 to $25,000 per month for enterprise use. According to the research paper, a more advanced, cautious chatbot might cost around 70,000–80,000 BDT ($600–$700) per month, though the exact amount wasn’t specified.

Would the average person be willing to pay that much per month just to use a chatbot?

And what would they get in return? A chatbot that, instead of confidently answering, frequently says:
“I don’t know.”
If you keep getting no answers despite paying a premium, you’d likely give up on using it or go back to older, cheaper versions. In either case, the AI company loses.

And that’s why stopping chatbots from "making things up" may be nearly impossible.