Large language models (LLMs) have catalyzed a new era of unprecedented innovation, leading to new products and the enhancement or reinvention of established ones for both businesses and the general public. LLMs have become integral to our daily personal and professional routines, helping us query information or summarize extensive data.

However, their capabilities, as impressive as they are, come with their own set of imperfections, notably concerning the reliability and accuracy of the content LLMs generate. Understood to sometimes produce hallucination errors, these models need to be used with care, particularly when it comes to tasks where accuracy is critical.

Cisco Research is committed to the principles of responsible AI and has championed both academic and internal AI research initiatives aimed at addressing the issues of LLM hallucination and reliability. The effort has led to many great scholarly papers and open-source contributions featured in premier AI conferences.

Recently, Cisco Research held a PI (Principal Investigator) summit that featured four distinguished NLP (natural language processing) researchers who unveiled their latest LLM research on detecting and mitigating LLM hallucinations. Below are insights presented by our esteemed panel of experts.

Principles of reasoning: Compositional and collaborative generative AI

William Wang from University of California, Santa Barbara led with a presentation featuring two of his notable research initiatives. Initially, he explored the complexities of in-context learning and prompt engineering algorithm design. In a recent publication at NeurIPS 2023i, the authors examine in-context learning through the lens of latent variables, positing that Large Language Models (LLMs) inherently serve as topic models. In the latter segment of his talk, he shifted focus to the Logic-LMii project, which was highlighted at EMNLP 2023. This project underscores the potential of symbolic reasoning to enhance the reasoning capabilities and truthfulness of LLMs in selected areas. Here, they adopted techniques like logic programming, first-order logic, constraint satisfaction problem (CSP) solvers, and satisfiability (SAT) solvers to reframe problems into structured language, enabling them to deploy suitable solver tools that align with the problem domain to refine the reasoning processes of LLMs.

Combating misinformation in the age of large language models (LLMs)

Kai Shu from the Illinois Institute of Technology brought a new perspective to the conversation about hallucinations in Large Language Models (LLMs). Kai highlighted recent research shared at the EMNLP conference, which delves into combating misinformation in the age of LLMs, underscoring the potential for malicious use in provoking AI-generated hallucinations. Shu and the authors argued that LLMs act as a double-edged sword in the misinformation domain, being capable of both detecting misinformation and, unfortunately, also generating it.

With a growing concern over the misuse of LLMs to craft sophisticated and hard-to-detect misinformation, the team addressed three main aspects: detection, mitigation, and source identification of misinformation. They proposed creative solutions to tackle these intricate challenges.

Enabling large language models to generate text with citations

Danqi Chen from Princeton University recently showcased their innovative research on citation quality generation and evaluation at the EMNLP conference. A key enhancement for the utility of searches conducted by LLMs is the integration of dependable citations throughout different segments of the model's output. This not only potentially eases the process of verifying the response but is also crucial for its reliability.

Tackling the challenge of evaluating the merit of such citations is no simple feat. To address this, Danqi introduced a benchmarking system named ALCEv, (Automatic LLMs' Citation Evaluation). This system promises to standardize the assessment of citation quality and, as a bonus, it paves the way for the development of more sophisticated citation mechanisms in LLM-driven searches.

Say correctly, see wrongly: Hallucination in large multimodal models

Finally, Huan Sun from The Ohio State University brought a refreshing perspective to the table regarding the phenomenon of hallucination in large language models. Contrary to common beliefs, she argued that hallucination shouldn't be seen as a glitch in LLMs but rather as an inherent feature that, when leveraged appropriately, can offer significant value. She likened it to human perception, which could be viewed as a type of controlled hallucination.

Further into her presentation, Sun introduced two of her team's projects: Mind2Webvi, the first dataset for developing and evaluating generalist agents for the web, and SeeActvii, which focuses on providing visual grounding for web agents.

Concluding insights and reflections on mitigating LLM hallucinations

You can find in-depth coverage of these presentations and the lively panel discussion that concluded the summit on Outshift’s YouTube channel.

To wrap up, here are some personal insights and reflections:

Retrieval-Augmented Generation (RAG) systems offer a step forward in enhancing the factuality of LLMs, yet they are not foolproof. In scenarios with significant consequences, additional layers of validation will be essential to identify possible inaccuracies or ”hallucinations.” Immediate remedies could involve calculating confidence levels or employing fact-checking against external databases, yet there is a clear need for further research into more dependable and streamlined methods.
Long-term strategies for addressing LLM hallucinations are likely to adopt a multifaceted approach, combining various techniques. This could range from enhancing reasoning through targeted fine-tuning to adopting iterative and agent-based methodologies like ReACT to even using tool-based aids, such as Python programs and symbolic solvers.
There's an urgent demand for better evaluation metrics and robust benchmark datasets that can thoroughly gauge the reasoning strengths of LLMs, their adherence to factual information, and their tendency to produce hallucinations.

Subscribe to Outshift’s YouTube channel for more featured Cisco Research summits.

References