Research/Structure Learning

Causal Discovery Model
Large-Scale Causal Structure Learning via Transformers

LCDM is a large-scale transformer-based causal discovery model that reframes causal learning as a general-purpose inference capability. By fine-tuning pretrained LLMs and performing post-hoc analysis of attention patterns, it reconstructs causal graphs in a zero-shot and scalable manner.

Core approach

A novel architecture that turns causal learning into an inference task, similar to how pretrained LLMs handle language.

Rather than training from scratch, we fine-tune existing LLMs. Since LLMs are trained via next-token prediction, they implicitly learn Markov blankets of variables. We leverage this property and apply post-hoc analysis of attention matrices to reconstruct causal graphs.

This approach enables scalable, zero-shot causal reasoning, and transforms causal learning into a reusable capability embedded in the foundation model itself.

Pretrained LLM

Next-token prediction

Attention Analysis

Jacobian gradients

Causal Graph

Zero-shot discovery

Key Ideas

Four foundational insights

01

Causal Learning as Inference

Reframes causal discovery as a general-purpose inference task — no dataset-specific retraining or handcrafted modeling required.

02

Fine-tuned LLMs, Not Trained from Scratch

Leverages existing pretrained LLMs. Since next-token prediction implicitly learns Markov blankets, the causal structure is already partially encoded.

03

Post-hoc Attention Analysis

Applies post-hoc analysis of attention matrices and Jacobian gradients to reconstruct causal graphs from the fine-tuned model.

04

Zero-shot & Scalable

Enables scalable, zero-shot causal reasoning — transforming causal learning into a reusable capability embedded in the foundation model itself.

Advantages

Traditional methods vs LCDM

Traditional Causal Discovery

  • Requires fresh optimization per dataset
  • Starts from scratch for each domain
  • Explicit conditional independence testing
  • Limited scalability (< 1K variables)

LCDM Approach

  • Generalizes across domains without retraining
  • Builds on massive pretraining investment
  • Implicit Markov blankets from next-token prediction
  • Scales to 200K+ variables with attention analysis