Multimodal QUD: Inquisitive Questions from Scientific Figures

A dataset and benchmark for questions that arise from figure-text interaction in scientific papers and require reasoning across both modalities.

Paper and data are available; code and model release links will be added when available.

Attention map scientific figure
MQUD question Why does the attention shift from the text to the image after the intervention?
Disease ranking figure
MQUD question Why do Llama3's logits show a different pattern from real-world disease prevalence?
Embedding utilization figure
MQUD question What are the implications of the middle panel's shape on dimensional utilization?

Overview

Existing figure QA often asks what is directly visible. Multimodal QUD asks what scientific readers wonder next.

Existing scientific figure QA benchmarks often focus on extracting visible information: labels, values, captions, or direct visual comparisons. Multimodal QUD instead targets questions raised by figure-text interaction, including mechanisms, evidence, implications, and the role a figure plays in the paper's argument.

Input Figure plus paper context

Questions are conditioned on the title, abstract, figure, caption, and surrounding paper passages.

Target Researcher-like curiosity

The task captures open-ended questions that emerge when visual patterns interact with a paper's claims.

Grounding Answer traces

Each example pairs the visible figure with answer evidence from the surrounding paper context.

Representative questions

From extracting answers to asking why

Representative questions show how MQUD moves beyond reading off values or best-performing methods toward explaining why a figure pattern matters.

Example gallery

Filter the sample gallery

Project links

arXiv preprint BibTeX
@misc{wu2026multimodalqud,
  title={Multimodal QUD: Inquisitive Questions from Scientific Figures},
  author={Wu, Yating and Rudman, William and Govindarajan, Venkata S. and Dimakis, Alexandros G. and Li, Junyi Jessy},
  year={2026},
  eprint={2604.23733},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2604.23733}
}