Conformal Information Pursuit for
Interactively Guiding Large Language Models

University of Pennsylvania
NeurIPS 2025

Abstract

A significant use case of instruction-finetuned Large Language Models (LLMs) is to solve question-answering tasks interactively. In this setting, an LLM agent is tasked with making a prediction by sequentially querying relevant information from the user, as opposed to a single-turn conversation. This paper explores sequential querying strategies that aim to minimize the expected number of queries. One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty. However, obtaining accurate estimates of mutual information or conditional entropy for LLMs is very difficult in practice due to over- or under-confident LLM probabilities, which leads to suboptimal query selection and predictive performance. To better estimate the uncertainty at each iteration, we propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets. More specifically, C-IP leverages a relationship between prediction sets and conditional entropy at each iteration to estimate uncertainty based on the average size of conformal prediction sets. In contrast to conditional entropy, we find that conformal prediction sets are a distribution-free and robust method of measuring uncertainty. Experiments with 20 Questions show that C-IP obtains better predictive performance and shorter query-answer chains compared to previous approaches to IP and uncertainty-based chain-of-thought methods. Furthermore, extending to an interactive medical setting between a doctor and a patient on the MediQ dataset, C-IP achieves competitive performance with direct single-turn prediction while offering greater interpretability.

High-Level Summary

Interactive Question Answering with LLMs

Imagine the setting where a doctor is trying to diagnose a patient. In contrast to typical question answering (QA) setting, the information necessary to make a confident prediction may not be all readily available at once. Instead, the doctor is required to ask a short sequence of informative questions about the patient in order to make a proper diagnosis. Our setting focuses on interactive question answering tasks with LLMs, where an LLM asks questions in the order of information gain about the task. That is, LLM sequentially ask queries that reduces the most uncertainty at each iteration. To implement this, we propose an information-theoretic framework called Information Pursuit (IP).

Uncertainty Estimation for LLLMs

As an information-theoretic framework, IP requires estimating probabilites, which are obtained from LLMs as "next-token probabilites" of the choices (e.g. probability of the token "A"), and estimating mutual information (or entropy) between the task and a query. However, it is commonly observed that probabilites from LLMs are not well-calibrated; they might be over- or under-confident at times, yielding us poor estimates of mutual information. This subsequently leads to sub-optimal query selection and predictive performance (dashed curve). To address the issue of miscalibrated probabilites, we leverage a distribution-free, finite-sample uncertainty quantification framework called Conformal Prediction. Rather than directly estimating mutual information from entropy, we estimate using the length of the prediction sets. By calibating the measures of uncertainty, we are able to obtain better performance and better measures of uncertainty (solid curve).

Example of Interactive Medical Question Answering

Below we show a more concrete example of medical question answering. Before querying any information, the doctor LLM has accecs to some initial information about the patient, which describes some initial information, the medical question, and the multiple choice options to the question.

Then, based on the information the doctor LLM has access to, the LLM selects the most informative query from a set of possible queries, which in this case is "What is the moisture level of the patient's mucous membranes?". The patient LLM then responds the question "She has dry mucous membranes" (The context is hidden from the doctor).

Conditioned on the answer to the first query, the doctor then asks the next most informative query. Then the patient again answers the query. This process continues until the uncertainty estimated from the LLM drops below some user-chosen threshold. Then the doctor makes a final prediction that the answer is D.

Concluding Remarks

For technique details about our method, please refer to Section 2 and 3 of our paper. In our experiments, we consider different settings, including closed query sets (fixed set of possible queries) and open query set (queries sampled from an LLM at each iteration). For our baselines, we compare with state-of-the-art Chain-of-Thought methods such as Uncertainty-of-Thought, as well as baseline probability calibration methods. Our method is also compared on multiple LLMs, and we analyze the calibration results in the appendix.

Poster

BibTeX

@inproceedings{chanconformal,
  title={Conformal Information Pursuit for Interactively Guiding Large Language Models},
  author={Chan, Kwan Ho Ryan and Ge, Yuyan and Dobriban, Edgar and Hassani, Hamed and Vidal, Rene},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}