Causal Discovery Model
Large-Scale Causal Structure Learning via Transformers

LCDM is a large-scale transformer-based causal discovery model that reframes causal learning as a general-purpose inference capability. By fine-tuning pretrained LLMs and performing post-hoc analysis of attention patterns, it reconstructs causal graphs in a zero-shot and scalable manner.

Core approach

A novel architecture that turns causal learning into an inference task, similar to how pretrained LLMs handle language.

Rather than training from scratch, we fine-tune existing LLMs. Since LLMs are trained via next-token prediction, they implicitly learn Markov blankets of variables. We leverage this property and apply post-hoc analysis of attention matrices to reconstruct causal graphs.

This approach enables scalable, zero-shot causal reasoning, and transforms causal learning into a reusable capability embedded in the foundation model itself.

Pretrained LLM

Next-token prediction

Attention Analysis

Jacobian gradients

Causal Graph

Zero-shot discovery

Key Ideas

Four foundational insights

Causal Learning as Inference

Reframes causal discovery as a general-purpose inference task — no dataset-specific retraining or handcrafted modeling required.

Fine-tuned LLMs, Not Trained from Scratch

Leverages existing pretrained LLMs. Since next-token prediction implicitly learns Markov blankets, the causal structure is already partially encoded.

Post-hoc Attention Analysis

Applies post-hoc analysis of attention matrices and Jacobian gradients to reconstruct causal graphs from the fine-tuned model.

Zero-shot & Scalable

Enables scalable, zero-shot causal reasoning — transforming causal learning into a reusable capability embedded in the foundation model itself.

Advantages

Traditional methods vs LCDM

Traditional Causal Discovery

Requires fresh optimization per dataset
Starts from scratch for each domain
Explicit conditional independence testing
Limited scalability (< 1K variables)

LCDM Approach

Generalizes across domains without retraining
Builds on massive pretraining investment
Implicit Markov blankets from next-token prediction
Scales to 200K+ variables with attention analysis

Publications

Related research

Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs

Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, Kun Zhang

NeurIPS 2020

Learning Discrete Concepts in Latent Hierarchical Models

Lingjing Kong, Guangyi Chen, Biwei Huang, Eric Xing, Yuejie Chi, Kun Zhang

NeurIPS 2024

Differentiable Causal Discovery for Latent Hierarchical Causal Models

Parjanya Prajakta Prashant, Ignavier Ng, Kun Zhang, Biwei Huang

arXiv preprint arXiv:2411.19556 (2024)

Latent Hierarchical Causal Structure Discovery with Rank Constraints

Biwei Huang, Charles Jia Han Low, Feng Xie, Clark Glymour, Kun Zhang

NeurIPS

Research Opportunities Back to Research

Causal Discovery ModelLarge-Scale Causal Structure Learning via Transformers