AI x Science

The physics of generative AI: How thermal noise can replace neural networks?

post_january22_2026

A paper published just two days ago in Physical Review Letters presents an idea that challenges how we think about generative models: what if we could build systems that generate structured data not through neural network computations, but through the natural physics of thermal fluctuations?

Stephen Whitelam from Lawrence Berkeley National Laboratory introduces Generative Thermodynamic Computing, a framework where the noise-driven dynamics of a physical system—rather than a digital neural network—performs the generation of structured outputs from noise. The approach is elegant, deeply connected to fundamental physics, and potentially 11 orders of magnitude more energy-efficient than digital alternatives.

The setup: A thermodynamic computer

The system consists of N classical, real-valued degrees of freedom 𝐱={x1,x2,...,xN}. These could physically represent voltage states in electrical circuits, oscillator positions in mechanical systems, or phases in Josephson junction devices. The key is that these are fluctuating quantities whose dynamics are governed by thermal interactions with their environment.

Each degree of freedom evolves according to overdamped Langevin dynamics:

x˙i=μVθ(𝐱)xi+2μkBTηi(t)

Let's unpack this equation:

This is the physics that governs everything from Brownian motion to the dynamics of molecules in solution. The insight is to harness it for computation.

The energy landscape

The potential energy function Vθ(𝐱) defines the computational landscape:

Vθ(𝐱)=i=1N(J2xi2+J4xi4)+i=1Nbixi+ijJijxixj

Three components shape this landscape:

1. Intrinsic nonlinearity (first sum): The J2 and J4 terms create the basic response of each unit. With J4>0, units become nonlinear—essential for the system to function as more than simple linear algebra. The quartic term also ensures thermodynamic stability as the coupling parameters are adjusted during training.

2. External biases (second sum): The bi terms are input signals applied to each unit, used to inject information into the system.

3. Pairwise couplings (third sum): The Jij terms couple different units together. These are the trainable parameters that will encode the learned structure—they're analogous to the weights in a neural network.

The architecture mirrors diffusion models: visible units (Nv=282=784 for MNIST) serve as the display, while hidden units (Nh=512 in the demonstration) perform computation. The trainable couplings connect visible-to-hidden and hidden-to-hidden units.

Training: learning to reverse time

Here's where the physics gets beautiful. The training objective is to find couplings that allow the computer to generate the reverse of a noising trajectory.

The forward process (noising)

Start with a structured image projected onto the visible units through their biases. Set all trainable couplings Jij to zero. Let the system equilibrate, then run dynamics while gradually reducing the bias intensity. The image degrades into noise—this is the "noising" process familiar from diffusion models.

The reverse probability via Onsager-Machlup

The key theoretical tool is the Onsager-Machlup action, which gives the probability that a particular trajectory was generated by the Langevin dynamics.

Using a discretized Euler scheme with timestep Δt:

xi(t+Δt)=xi(t)μVθxiΔt+2μkBTΔtηi

The displacement Δxi=xi(t+Δt)xi(t) requires drawing noise values ηi. Inverting this relationship gives:

ηi=Δxi+μVθxiΔt2μkBTΔt

Since the ηi are Gaussian with unit variance, the probability of generating a forward step 𝐱𝐱+Δ𝐱 is:

Pθstep(Δ𝐱)=(2π)N/2i=1Nexp(ηi2/2)

Taking the negative log-probability:

lnPθstep(Δ𝐱)=i=1N[Δxi+μiVθ(𝐱)Δt]24μkBTΔt

For the reverse step (𝐱+Δ𝐱𝐱𝐱):

lnP~θstep(Δ𝐱)=i=1N[Δxi+μiVθ(𝐱)Δt]24μkBTΔt

Gradient descent on couplings

To maximize the probability of generating the reverse trajectory, we sum over all steps and differentiate with respect to each parameter:

JijJij+αk=1KJijlnP~θstep[Δ𝐱(tk)]

bibi+αk=1KbilnP~θstep[Δ𝐱(tk)]

The gradients can be computed analytically:

JijlnP~θstep=Δxi+μiVθ(𝐱)Δt2kBTxj+Δxj+μjVθ(𝐱)Δt2kBTxi

bilnP~θstep=Δxi+μiVθ(𝐱)Δt2kBT

where the energy gradient is:

iVθ(𝐱)=2J2xi+4J4xi3+bi+j𝒩(i)Jijxj

This is remarkably clean: the gradients depend only on local information—the displacements, forces, and neighboring unit values.

The thermodynamic interpretation: Minimizing heat

Here's where the framework connects to fundamental physics. Consider the ratio of two probabilities: the forward step probability with reference couplings (θ=0) and the reverse step probability with trained couplings:

lnP0step(Δ𝐱)P~θstep(Δ𝐱)ΔQ0+ΔQθ2kBT

where ΔQ0 and ΔQθ are the incremental heat dissipated by the reference and trained computers, respectively.

Integrated over an entire trajectory:

lnP0[ω]Pθ[ω~]=12[βQ0(ω)+βQθ(ω)]

Training minimizes lnPθ[ω~]. Since the reference process is fixed, this is equivalent to minimizing Qθ(ω), the negative heat dissipated by the denoising computer when generating the noising trajectory.

But heat changes sign under time reversal. So the learning process minimizes the heat Qθ(ω~)=Qθ(ω) emitted by the trained computer as it generates structure from noise.

The trained dynamics is thermodynamically optimal: it reconstructs the imposed data with minimal heat emission and entropy production. This links generative modeling directly to the second law of thermodynamics.

Numerical results

Whitelam demonstrates the framework with a digital simulation using J2=J4=10kBT, 784 visible units (28×28 grid), and 512 hidden units. Training uses only three MNIST digits.

The results show:

The hidden units develop interpretable receptive fields—localized, digit-like structures that decompose inputs into visual components. These patterns act as the features that guide the energy landscape.

The energy efficiency argument

The thermodynamic advantage is striking. Consider the energy scales:

Digital neural network: A multiply-accumulate (MAC) operation costs ~1 pJ, or 2.4×108kBT at room temperature. A modest MLP denoiser (784→128→128→784) requires ~2.2×105 MACs per step. Even with only 10 denoising steps, the energy budget exceeds 5×1014kBT.

Thermodynamic computer: The heat emitted can be calculated from the potential energy difference between trajectory start and end: Q=Vθ[𝐱(0)]Vθ[𝐱(tf)]. Over 1000 denoising trajectories, the mean heat emission is Q=2.9×103kBT with standard deviation 3.5×102kBT.

The ratio exceeds 1011—the thermodynamic computer would be more than 10 orders of magnitude more energy-efficient.

How this differs from Boltzmann Machines

The system resembles a nonequilibrium, continuous-spin Boltzmann machine, but with crucial differences:

Boltzmann Machine:

Langevin Computer:

The Langevin computer runs on a physical clock—computation happens at a designated time without requiring equilibration. This is fundamentally different from sampling an equilibrium distribution.

What this means for hardware

If realized in analog hardware—networks of mechanical oscillators, electrical circuits, or superconducting devices—the system would:

  1. Generate structured outputs by simply evolving with time under natural dynamics
  2. Require no added pseudorandom noise—thermal fluctuations provide the stochasticity
  3. Need no neural network guidance—the learned couplings encode all necessary information in the energy landscape

The paper notes that hybrid approaches could work too: a neural network could adjust the computer's couplings as a function of time, or set couplings to produce conditioned outputs. But the core insight is that analog hardware alone can be generative.

Open questions

Several questions remain for scaling this approach:

The bigger picture

What strikes me about this work is how it reframes generative modeling as a question of physics rather than computation. The training process isn't just optimizing a loss function—it's finding the dynamics that minimizes entropy production. The generated outputs aren't just samples from a learned distribution—they're the thermodynamically optimal reconstructions of structured data.

This connects machine learning to a century of work on nonequilibrium statistical mechanics, opening possibilities for understanding generative models through the lens of physical law. Whether or not thermodynamic computers become practical hardware, this perspective enriches our understanding of what generation fundamentally means.