Truth, Humility and Human & AI Incentives

This post is 85-95% accurate

Sep 15, 2025

AI models don’t usually ask clarifying questions, they just make the best guess they can. For short back-and-forth chats this makes sense since the user can always ask follow-up questions. But when the AI generates something longer, such as a story or code, it would be better to clarify first, since it can take a while for the user to review the full output. ChatGPT already does this for Deep Research mode, but ideally models should know when to ask clarifying questions more generally.

This issue is similar to model hallucination, where a model confidently makes up facts. This is much worse than just saying it doesn’t know, since people may incorrectly rely on something false. The risk of hallucinations also makes it harder to rely on AI models more generally. AI chat apps have improved over time as they now can search and “think” before outputting a final answer, but hallucinations have not been eliminated for the following reasons:

Pretraining - Model are trained to predict how text will continue (by predicting the next token), but this can result in output that “sounds right” but isn’t actually correct.
Post-training, such as RLHF, will often reward more confident answers instead of more calibrated answers. In fact, the models will often become less calibrated after post-training! This seems like something that could be improved.
Socio-technical incentives - As discussed in this post (and paper) from OpenAI, the AI leaderboards reward the number of right answers, so if a model says it’s not sure it will only lose points. This may partially explain why the post-training process is flawed.

The AI hallucinations may be caused by flawed human incentives, but OpenAI and others are working at improving the process so AI models become more “humble” and better-calibrated over time. While they’re focused on a training it to say when it doesn’t know something, this could help it learn when to ask clarifying questions from the user.

As with many critiques of AI, humans fall short in similar ways and often say things with overconfident certainty. People might do this since in many social contexts it sounds better to be confident than to be accurate. Over time maybe the ease of being fact-checked by an AI will nudge people’s incentives to be more accurate. And if you want to improve your calibration and accuracy, try out betting on prediction markets or using the tool Calibrate your judgment.

Zappable

Discussion about this post