Causal Discovery Model
Large-Scale Causal Structure Learning via Transformers
LCDM is a large-scale transformer-based causal discovery model that reframes causal learning as a general-purpose inference capability. By fine-tuning pretrained LLMs and performing post-hoc analysis of attention patterns, it reconstructs causal graphs in a zero-shot and scalable manner.
Core approach
A novel architecture that turns causal learning into an inference task, similar to how pretrained LLMs handle language.
Rather than training from scratch, we fine-tune existing LLMs. Since LLMs are trained via next-token prediction, they implicitly learn Markov blankets of variables. We leverage this property and apply post-hoc analysis of attention matrices to reconstruct causal graphs.
This approach enables scalable, zero-shot causal reasoning, and transforms causal learning into a reusable capability embedded in the foundation model itself.
Pretrained LLM
Next-token prediction
Attention Analysis
Jacobian gradients
Causal Graph
Zero-shot discovery
Key Ideas
Four foundational insights
Causal Learning as Inference
Reframes causal discovery as a general-purpose inference task — no dataset-specific retraining or handcrafted modeling required.
Fine-tuned LLMs, Not Trained from Scratch
Leverages existing pretrained LLMs. Since next-token prediction implicitly learns Markov blankets, the causal structure is already partially encoded.
Post-hoc Attention Analysis
Applies post-hoc analysis of attention matrices and Jacobian gradients to reconstruct causal graphs from the fine-tuned model.
Zero-shot & Scalable
Enables scalable, zero-shot causal reasoning — transforming causal learning into a reusable capability embedded in the foundation model itself.
Advantages
Traditional methods vs LCDM
Traditional Causal Discovery
- Requires fresh optimization per dataset
- Starts from scratch for each domain
- Explicit conditional independence testing
- Limited scalability (< 1K variables)
LCDM Approach
- Generalizes across domains without retraining
- Builds on massive pretraining investment
- Implicit Markov blankets from next-token prediction
- Scales to 200K+ variables with attention analysis
Publications
Related research
Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs
Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, Kun Zhang
NeurIPS 2020
Learning Discrete Concepts in Latent Hierarchical Models
Lingjing Kong, Guangyi Chen, Biwei Huang, Eric Xing, Yuejie Chi, Kun Zhang
NeurIPS 2024
Differentiable Causal Discovery for Latent Hierarchical Causal Models
Parjanya Prajakta Prashant, Ignavier Ng, Kun Zhang, Biwei Huang
arXiv preprint arXiv:2411.19556 (2024)
Latent Hierarchical Causal Structure Discovery with Rank Constraints
Biwei Huang, Charles Jia Han Low, Feng Xie, Clark Glymour, Kun Zhang
NeurIPS