Towards translational and mechanistic whole-brain simulations with The Virtual Brain

Track 4 Workshop — FAIR Brain Data Science Bootcamp

Leon Stefanovski
Marius Pille
Leon Martin

May 12, 2026

Welcome

Schedule

Day 1
Block 1 - 09:00-12:00:
1. Intro to Brain Network Modelling
2. Defining a FAIR brain network model
3. Setup & First examples
4. Applications in Basic and Clinical Research

Block 2 - 13:00-15:00:
5. Bifurcation Theory
6a. Parameter exploration & inference I

Block 3 - 15:30-17:30:
6b. Parameter exploration & inference II
7. Stimulating the brain

Day 2
Block 1 - 09:00-11:30:
Mini hackathon - Bring your own data / research question
Plotting the Brain

Block 2 - 12:00-13:30:
Mini hackathon - Bring your own data / research question
Plotting the Brain

Workshop Website

virtual-twin.github.io/Brain-Simulation-Workshop

1. Intro to Brain Network Modelling

Why network models? Same lesion, different outcomes

  • Two patients, similar lesion location and volume
  • Different deficits, different recovery trajectories
  • The locus alone doesn’t predict outcome
  • Function lives in the network: regional damage propagates through the connections that survive

Same lesion, same network position, but a hub lesion (left, degree 6) wipes out most of the network’s input, while a peripheral lesion (right, degree 1) barely touches it.

To predict outcome, model the whole network, not a catalogue of regions.

Why dynamics? The brain is never idle

  • At rest, the brain shows structured spontaneous activity, not noise
  • Resting-state networks are reproducible across subjects and species
  • Tasks are perturbations of this ongoing dynamical state
  • Like a tennis player at the baseline: never still, poised for the next service

A model has to reproduce the resting repertoire, not only stimulus responses.

Resting activity for 8 regions in 3 modules. Top: traces wiggle in sync within each module, decoupled across them. Bottom: the resulting FC matrix shows three clear blocks.

Outputs match what we measure

  • TVB outputs what scanners record: EEG · MEG · SEEG · fMRI BOLD
  • Each modality is a forward model (\(h\)) on the same simulated state
  • Timescales come for free, EEG/SEEG in ms, BOLD in seconds

Direct comparison with data, no translation gap.

One simulated state, three observables. Top: EEG/MEG-like fast oscillations. Middle: SEEG with sharp transients. Bottom: BOLD, the slow envelope through the HRF.

A short history of TVB

From scans to a simulator: the TVB recipe

Ingredient Symbol Source
Connectome (weights, delays) \(A_{ij},\ \tau_{ij}\) DTI / tractography
Local dynamics \(f\) Neural mass / mean-field model
Coupling \(g\) Network propagation, with delays
Observation \(h\) BOLD / EEG / MEG forward model

The next slides unpack each building block.

Building Block 1: The Connectome

Diffusion MRI → tractography → parcellation → connectivity matrix. The output is the structural backbone \(A_{ij}\) (weights) and \(\tau_{ij}\) (delays).

Building Block 2: Dynamical Systems

A mass \(m\) on a spring with stiffness \(k\).
Perpetual oscillation at \(\omega = \sqrt{k/m}\).

State Equations: \[ \dot{x} = v \] \[ \dot{v} = - \frac{k}{m}*x \]

State Variables

Variable Initial Value Unit Description
\(x\) 2.0 \(\mathrm{m}\) Displacement from equilibrium
\(v\) 0.0 \(\frac{\mathrm{m}}{\mathrm{s}}\) Velocity

Parameters

Parameter Value Unit Description
\(k\) 0.0001 \(\frac{\mathrm{N}}{\mathrm{m}}\) Spring stiffness constant
\(m\) 1.0 \(\mathrm{kg}\) Mass of the oscillating body

Initial Conditions

Same spring-mass system as before.

What changes?

  • Initial displacement only: \(x_0 \in \{-1,-2,-3\}\)
  • Velocity stays fixed at \(v_0 = 0\)
  • Parameters stay fixed: \(k=10^{-4},\; m=1\)

What do we see?

  • Same frequency in all cases
  • Larger initial stretch gives larger amplitude
  • Closed motion stays perfectly periodic

Phase Space

Same spring-mass system. Only the representation changes.

What changes?

  • Plot state as \((x,v)\) instead of \(x(t)\)
  • Use several initial conditions
  • Add the vector field of the dynamics

What do we see?

  • Closed trajectories around equilibrium
  • Conservative dynamics with no damping
  • Geometry of the flow instead of time evolution

Parameters — Mass

Same equations and same initial condition. Only the mass changes.

What changes?

  • \(m \in \{0.5, 1.0, 2.0\}\)
  • \(x_0=-2\), \(v_0=0\) stay fixed
  • \(k\) stays fixed

What do we see?

  • Frequency follows \(\omega = \sqrt{k/m}\)
  • Heavier masses oscillate more slowly
  • Box size encodes the mass value visually

Towards Realism — Damping & Load

Now we modify the toy model toward a more realistic oscillator.

What changes?

\[ \dot{v} = -\frac{k}{m}x - \frac{c}{m}v + g_{\mathrm{eff}} \]

  • Add damping: \(c=0.006\)
  • Add constant load: \(g_{\mathrm{eff}}=0.00015\)

What do we see?

  • Oscillations decay over time
  • Equilibrium shifts away from zero
  • Compare ideal toy model vs damped + load

Phase Space: Damped Spring

Spring Oscillator

Damped Spring

Towards Brain Dynamics: Mean Field Models

Mean field models & Neural Masses

  • Population rate: \(r(t) = \langle \#\text{spikes}\rangle / (N\,\Delta t)\)
  • Jansen-Rit (3 populations): PC = \(y_1 - y_2\), the EEG-like PSP at pyramidal cells

EEG-Like Activity with the Jansen-Rit-Model

JansenRit1995 Lumped-parameter cortical column model with three neural populations: pyramidal cells, excitatory interneurons, and inhibitory interneurons. Each population is modeled by a PSP block (2nd-order linear filter) and a sigmoidal potential-to-firing-rate transformation.

Derived Variables \[ v_{pyr} = y_{1} - y_{2} \]

State Equations \[ \dot{y_{0}} = y_{3} \] \[ \dot{y_{3}} = - y_{0}*a^{2} - 2*a*y_{3} + A*a*\operatorname{Sigm}{\left(y_{1} - y_{2} \right)} \] \[ \dot{y_{1}} = y_{4} \] \[ \dot{y_{4}} = - y_{1}*a^{2} - 2*a*y_{4} + A*a*\left(p + C_{2}*\operatorname{Sigm}{\left(C_{1}*y_{0} \right)} + K*c_{intercolumn}\right) \] \[ \dot{y_{2}} = y_{5} \] \[ \dot{y_{5}} = - y_{2}*b^{2} - 2*b*y_{5} + B*C_{4}*b*\operatorname{Sigm}{\left(C_{3}*y_{0} \right)} \]

with \[ \operatorname{Sigm}{\left(v \right)} = \frac{2*e_{0}}{1 + e^{r*\left(v_{0} - v\right)}} \]

State Variables

Variable Initial Value Unit Description
\(y_{0}\) 0.1 \(\mathrm{mV}\)
\(y_{3}\) 0.1 \(\frac{\mathrm{mV}}{\mathrm{s}}\)
\(y_{1}\) 0.1 \(\mathrm{mV}\)
\(y_{4}\) 0.1 \(\frac{\mathrm{mV}}{\mathrm{s}}\)
\(y_{2}\) 0.1 \(\mathrm{mV}\)
\(y_{5}\) 0.1 \(\frac{\mathrm{mV}}{\mathrm{s}}\)

Parameters

Parameter Value Unit Description
\(A\) 3.25 \(\mathrm{mV}\) Maximum amplitude of excitatory PSP (EPSP)
\(B\) 22 \(\mathrm{mV}\) Maximum amplitude of inhibitory PSP (IPSP)
\(C\) 135 Lumped connectivity constant (C = C1)
\(a\) 100 \(\mathrm{s}^{-1}\) Reciprocal of EPSP time constant
\(b\) 50 \(\mathrm{s}^{-1}\) Reciprocal of IPSP time constant
\(v_{0}\) 6 \(\mathrm{mV}\) PSP for 50% firing rate (sigmoid midpoint)
\(e_{0}\) 2.5 \(\mathrm{s}^{-1}\) Half of maximum firing rate
\(r\) 0.56 \(\mathrm{mV}^{-1}\) Steepness of sigmoid transformation
\(p\) 220 \(\mathrm{s}^{-1}\) Mean external input (afferent pulse density)
\(K\) 0 Per-node coupling strength scaling (K[i] scales coupling received by node i)

Derived Parameters

\[ C_{1} = C \] \[ C_{2} = 0.8*C \] \[ C_{3} = 0.25*C \] \[ C_{4} = 0.25*C \]

Building Block 3: Coupling → Network Dynamics

Uncoupled system

Kuramoto (1975)

Phase oscillator model. Each node evolves with its own natural frequency \(\omega\); coupling \(I\) is added by the network.

State Equations \[\dot{\theta} = \omega\]

State Variables

Variable Initial value Unit
\(\theta\) \(\sim\mathcal{U}(0,2\pi)\)

Derived Variables \[activity = \sin(\theta)\]

Parameters

Parameter Value Unit Description
\(\omega\) 1.0 Natural frequency

Simulation settings

Duration 100 s
Step size 0.01 s
Integrator Heun
Nodes 3 (V1, V2, MT)
Coupling \(K\) 0.0 (uncoupled)

Coupled system

Kuramoto (1975) — same model as before, now with coupling enabled.

State Equations \[\dot{\theta} = I + \omega\]

Coupling: \(I\)

\[ I = \frac{K \sum_{j=0}^{-1 + N} - {g}_{i,j} \sin{\left(\theta_{i} - \theta_{j} \right)}}{3} \]

Receives \(\theta\) from connected regions.

Pre-synaptic: \(c_{\text{pre}} = \sin(\theta_j - \theta_i)\)

Post-synaptic: \(c_{\text{post}} = \dfrac{K \cdot gx}{3}\)

Coupling parameters

Parameter Value Description
\(K\) 0.4 Global coupling strength

Network

Edge Weight
V1 → V2 0.7
V2 → MT 0.4
V1 → MT 0.2

Simulation settings

Duration 100 s
Step size 0.01 s
Integrator Heun

Mathematical framework

\[dS_i = \left[f_d(S_i, \theta^d, C_i, I_i) \right]dt + g(S_i, \theta^g)\, dW_i\]

State Evolution

  • \(S_i\) - state variables at node \(i\)
  • \(f_d\) - dynamics function with parameters \(\theta^d\)
  • \(C_i\) - coupling input from connected nodes
  • \(I_i\) - external input (stimulation, driving signals)
  • \(g\) - noise diffusion coefficient with parameters \(\theta^g\)
  • \(dW_i\) - Wiener process (stochastic fluctuations)

============================================================
STEP 1: Running simulation...
============================================================
  Simulation period: 10.0 ms, dt: 0.01220703125 ms
  Transient period: 0.0 ms
  Simulation complete.

============================================================
Experiment complete.
============================================================

Coupled Brain Network

\[ C_i = f_c^{\text{post}}\Big(\sum_j A_{ij}\, f_c^{\text{pre}}(S_i, S_j(t-\tau_{ij}), \theta^c), S_i, \theta^c\Big) \]

Coupling

  • \(f_c^{\text{pre}}\) - pre-aggregation transformation
  • \(A_{ij}\) - structural connectivity weight (\(j \to i\))
  • \(\tau_{ij}\) - transmission delay (tract length / conduction speed)
  • \(f_c^{\text{post}}\) - post-aggregation transformation
  • \(\theta^c\) - coupling parameters

2. Defining a FAIR brain network model

FAIR Guiding Principles

(The Turing Way Community and Scriberia 2024)

FAIR Principles

(The Turing Way Community and Scriberia 2024)

A Simulation Experiment in Computational Neuroscience

A Simulation Experiment in Computational Neuroscience

SimulationExperiment
Network
Parcellation Node Edge
Dynamics
StateVariable Parameter Equation DerivedVariable DerivedParameter Function
Coupling
Parameter Equation IncomingState LocalState
Integration
Method Noise InitialCondition TimeSeries
Event
Equation Target
Stimulus
Waveform
Observation
Source Pipeline DataSource DerivedObs.
Exploration
Exploration Algorithm
Analysis
Optimization Continuation
Environment
SoftwareEnv. Execution Reference Author

Model Specification with TVB-O

Metadata Schema

id: 21
model: Dynamics
dynamics:
  name: Dynamics
  parameters:
    a:
      name: a
      value: 1.0
  state_variables:
    x:
      name: x
      equation:
        rhs: a*x
      initial_value: 1.0
integration:
  method: Heun
  time_scale: ms
  duration: 4.0
  step_size: 0.01
explorations:
  name: a_sweep
  space:
  - parameter: a
    explored_values:
    - 1.0
    - 0.0
    - -1.0

Python API

from tvbo import DynamicalSystem, SimulationExperiment
from tvbo.datamodel.schema import Exploration, ExplorationAxis

bsplot.style.use("tvbo")

g = DynamicalSystem(
    state_variables={"x": {
                          "equation": {"rhs": "a*x"},
                          "initial_value": 1.0}},
    parameters={"a": {"value": 1.0}},
)

exp = SimulationExperiment(dynamics=g)

exp.explorations = Exploration(
    name="a_sweep",
    space=ExplorationAxis(
      parameter="a",
      explored_values=[1.0, 0, -1.0]
      ),
)

exp.integration.duration = 4.0
exp.integration.step_size = 0.01

res = exp.run("tvboptim")

Specifying an experiment with TVB-O

What makes specifying brain network models easier?

  • Common vocabulary: Ontology-backed definitions link terms to persistent identifiers — no more ambiguous symbols across papers
  • Machine-readable metadata: Compact YAML schema captures everything needed to reproduce a simulation
  • Curated database: Published models, brain networks, and study configurations — ready to load and run
  • Code generation: One specification → executable code for TVB, JAX, Julia, and more

Applications in basic research

3. Setup & First Examples

See setup

4. Clinical Applications

5. Bifurcation Analysis

How to analyse a brain system

Time series \(x(t)\) at fixed \((a, x_0)\)

Phase plane flow geometry at fixed \(a\)

Bifurcation diagram regimes as \(a\) varies

Linear Systems

\[ \dot x = a\,x \quad\Rightarrow\quad x(t) = x_0\,e^{a t} \]

  • \(x(t)\): State variable
  • \(x_0\): Initial condition
  • \(a\): Parameter

Solution: \[ x(t) = x_0\,e^{a t} \]

Equilibrium: \[ \dot{x}(t) = ax(t) = 0? \] \[ \quad\Rightarrow\quad x_{eq} = 0 \]

2D portraits

2D portraits

Regime-change building blocks

  • Top: equilibria as \(a\) varies.
  • Bottom: local flow \(\dot x=f(x)\) at the dotted parameter slice.

Hopf: birth of an oscillation

\[ \dot z = (a + i\omega)z - |z|^2z, \qquad r_{\mathrm{cycle}} = \sqrt a \]

fixed point loses stability \(\Rightarrow\) stable oscillation appears

Dynamical Regimes

Numerical continuation

\[ F(x, a)=0, \qquad (x,a)=(x(s),a(s)) \]

  • follow branches through folds
  • detect folds and Hopf points
  • switch to periodic orbits

TVB simulation infused with bifurcation analysis

TVBO + tvboptim simulation of Generic2dOscillator on the Desikan-Killiany network. Nodes are colored by their instantaneous \(V(t)\) and projected onto axial (top) and sagittal views; the regime map places each node at its current \((I_{\mathrm{eff}}, V)\) on the bifurcation diagram.

Bifurcation analysis: hands-on

result = exp.run("bifurcationkit.jl")

6.. Parameter exploration and inference

Models as computational hypotheses

A brain network model is a hypothesis with knobs. Picking knob values is how we test the hypothesis against data.

G theta θ sim brain network model theta->sim yhat ŷ(θ) sim->yhat loss 𝓛(θ) yhat->loss yobs y yobs->loss

  • \(\theta\): model parameters (the knobs)
  • \(\hat{y}(\theta)\): simulated observable (FC, power spectrum, BOLD …)
  • \(y_{\text{obs}}\): empirical observable / data
  • \(\mathcal{L}\): loss / discrepancy

What is inference for brain models?

Inference: which \(\theta\) are consistent with the data?

Gradient descent finds a local minimum

What do we need to do?

  1. Define a loss function \(\mathcal{L}(\theta)\)
  2. Test many \(\theta\) configurations on \(\mathcal{L}(\theta)\) to find the “best”

Why inference is hard for brain models?

  • Expensive simulator: seconds to minutes per run
  • High-dimensional: when parameters become regional (\(\times N\) nodes)
  • Non-smooth dynamics: chaos, stochasticity, stiffness
  • Bumpy observables: FC, FCD, PSD are summary statistics
  • Parameter degeneracy: many parameter combinations fit about equally well

Every method is a bet about which of these difficulties dominates.

What makes inference for brain models easier?

  • Faster simulator: Parallelization and Accelerators (GPUs, TPUs, etc.)
  • Differentiability: Gradient-based optimization
  • Good Workflows: Building complexity gradually

The example problem: fMRI functional connectivity

Find global coupling \(G\) and recurrence \(w\) so simulated FC matches empirical FC in a Reduced Wong-Wang model on the Desikan-Killiany parcellation.

Optimization over a parameter landscape:
\(\mathcal{L}(\theta) = \mathrm{RMSE}(\hat{FC}(\theta),\, FC_{\text{obs}})\).

  • two knobs → still easy to picture
  • note the low-loss valley: many \((w, G)\) pairs fit about equally well → a degeneracy typical of brain network models

Defining the running example in TVB-Optim

TVB-Optim: A JAX based toolbox for whole-brain simulation and inference (Pille et al. 2025)

# 1. Compose the brain network model from graph, dynamics, coupling, noise
graph    = DenseGraph(weights, region_labels=region_labels)
dynamics = ReducedWongWang(w=0.5, I_o=0.32, INITIAL_STATE=(0.3,))
coupling = FastLinearCoupling(local_states=["S"], G=0.5)
noise    = AdditiveNoise(sigma=0.00283, apply_to="S")
network  = Network(dynamics=dynamics, coupling={"instant": coupling},
                   graph=graph, noise=noise)

# 2. prepare() splits what to compute (`model`, pure & jittable) from what to plug in (`state`)
model, state = prepare(network, Heun(), t1=120_000, dt=4.0)

# 3. Observation + loss are plain Python functions of `state`
# Every algorithm in this section calls the same loss(s); only how `state` is varied changes
def observation(state):
    return compute_fc(bold_monitor(model(state)), skip_t=20)

def loss(state):
    return rmse(observation(state), fc_target)

Parallelism: jax.vmap and jax.pmap under the hood

# Explore a 10x10 = 100 grid over (theta_1, theta_2)
state.coupling.theta_1 = GridAxis(-1.0, 1.0, 10)
state.dynamics.theta_2 = GridAxis(-1.0, 1.0, 10)
losses = ParallelExecution(loss, Space(state, mode="product"), n_pmap=8).run()

Grid search: map the whole landscape

  • Same idea as the bifurcation diagrams from §3
  • Wins when: \(\le 2\)\(3\) parameters, you want to see the landscape
  • Cost grows as \(N^d\) — unusable beyond a handful of dimensions
# Explore a 32x32 grid space of w and G
grid_state.dynamics.w = GridAxis(0.001, 0.7, 32)
grid_state.coupling.G = GridAxis(0.001, 0.7, 32)
grid = Space(grid_state, mode="product")
losses = ParallelExecution(loss, grid, n_pmap=8).run()

Random search: the surprising upgrade

In more than 2–3 dimensions, random search beats grid for the same budget.

Why? Real loss surfaces have low effective dimensionality, most parameters barely matter. Grid spends its budget evenly across all axes; random samples every axis at every trial (Bergstra and Bengio 2012).

Grid vs. Random search, taken from (Bergstra and Bengio 2012) figure 1
# Explore a 100 samples space of w and G
random_state.dynamics.w = UniformAxis(0.001, 0.7, 100)
random_state.coupling.G = UniformAxis(0.001, 0.7, 100)
samples = Space(random_state, mode="zip", key=jax.random.key(42))
losses = ParallelExecution(loss, samples, n_pmap=8).run()

Random search onto 2D loss surface. 1/10th the evaluation budget compared to grid search.

Quasi-random Sobol: Sensitivity for free

  • Sobol / Latin Hypercube sequences: low-discrepancy, drop-in replacement for the uniform sampler
  • Same design powers variance-based sensitivity analysis: Saltelli + Sobol indices quantify which knobs the loss actually depends on
problem = {"num_vars": 3,
           "names":  ["G", "w", "sigma"],
           "bounds": [[0.001, 0.7], [0.001, 0.7], [0.001, 0.01]]}
samples = sobol_sample.sample(problem, 256)   # Saltelli design

state.coupling.G  = DataAxis(samples[:, 0])
state.dynamics.w  = DataAxis(samples[:, 1])
state.noise.sigma = DataAxis(samples[:, 2])
losses = ParallelExecution(loss, Space(state, mode="zip")).run()
Si = sobol_analyze.analyze(problem, np.array(losses))

  • \(S_T(\sigma) \approx 0\)
  • \(S_T(G) \approx 0.93\)
  • \(S_T(w) \approx 0.49\)
  • \(S_2(G\times w) = 0.46\)

Take-away: the FC loss lives on an effective 2D subspace.

Evolutionary Approaches: adaptive, no gradients

  • Maintain a population, update a mean and covariance to bias future samples
  • Naturally parallel: every generation is one DataAxis evaluation
  • No gradients needed → works on noisy / discontinuous losses
  • Wins when: ~10–50 parameters
es = cma.CMAEvolutionStrategy([0.05, 0.6], 0.15,
  {"popsize": 16})
while not es.stop():
    pop = np.array(es.ask())
    s.coupling.G = DataAxis(pop[:, 0])
    s.dynamics.w = DataAxis(pop[:, 1])
    losses = ParallelExecution(
        loss, Space(s, mode="zip")).run()
    es.tell(pop.tolist(), np.array(losses).tolist())

CMA-ES on (G, w). Dots: population of 16 candidates per generation. Red ×: distribution mean. Red ellipse: 2-σ covariance. The ellipse rotates onto the valley.

Hitting the wall: why sampling stops scaling

Everything so far - grid, random / Sobol, Genetic algorithms - searches by sampling the simulator.

But a brain network model with regional parameters has \(d \sim 10^2\)\(10^4\) knobs (e.g. one \(w_i\) per Desikan-Killiany region, plus per-region noise, plus …). No sampling budget reaches that regime.

Automatic differentiation: Gradients (almost) for free

Run forward once, run backward once. Reverse-mode AD sweeps the computation graph in reverse, accumulating \(\partial\mathcal{L}/\partial\theta_i\) for every \(i\) at the same time → \(\text{cost}(\nabla_\theta \mathcal{L}) \approx 3\text{–}30 \times \text{cost}(\mathcal{L})\).

Gradient cost in tvboptim, normalized by one simulation. Reverse-mode (blue) is flat at \(\sim 20\times\) across \(d \in [1, 10^4]\), independent of dimension.

In JAX: jax.grad(loss). One line, gradients with respect to every knob — what made per-region fits tractable.

Gradient descent: The regime-changer

\[ \theta_{t+1} \;=\; \theta_t - \eta \,\nabla_\theta\, \mathcal{L}(\theta_t) \]

  • The cheapest way to scale to regional / per-node parameter vectors (hundreds–thousands of knobs)
  • Local: needs a reasonable starting point
  • Point estimate: Only finds a single good fit
state.dynamics.w = Parameter(state.dynamics.w)
state.coupling.G = Parameter(state.coupling.G)
opt = OptaxOptimizer(loss, optax.adam(0.01))
fitted, _ = opt.run(state, max_steps=200)

Gradient descent on (G, w). White path: 200 Adam iterates from a deliberately off-valley start into the low-loss valley. One trajectory, one optimum → gradient descent finds a good fit cheaply but does not characterize the valley itself.

Bayesian inference — when you want a posterior

A point estimate \(\theta^\star\) tells you one good fit. A posterior tells you all fits consistent with the data.

\[ \underbrace{p(\theta \mid y)}_{\text{posterior}} \;\propto\; \underbrace{p(y \mid \theta)}_{\text{likelihood}} \;\cdot\; \underbrace{p(\theta)}_{\text{prior}} \]

Priors encode what you (already) know

A prior \(p(\theta)\) is a probabilistic statement about parameters before you see this dataset. Three regimes:

  • Flat / uninformative: Uniform over a physiological range. “I know the bounds, nothing else.” Posterior is dominated by data.
  • Weakly informative: Normal around a literature value with generous spread. Regularizes, prevents pathological fits.
  • Strongly informative: derived from independent measurements: anatomy, tau / amyloid PET maps, receptor density, EEG power. “This patient’s region has elevated tau” → prior on local excitation per region.

Same simulator, same data, different priors → different posteriors. The prior is where domain knowledge enters the inference, cleanly and auditably.

HMC vs SVI: same model, two budgets

Same model, same wide priors (\(G, w \sim \mathcal{U}(0.001, 0.7)\)). Different inference algorithm.

HMC (left) and SVI (right) on the same model. Background: grid-scan loss landscape. Red points: posterior samples; red contours: KDE; histograms: marginals. HMC bends along the valley; the SVI Gaussian aligns with it locally but cannot curve.

HMC / NUTS — gold standard.

  • Asymptotically exact samples; recovers the curved ridge
  • ~4 h for 300 samples · simulator-bound

SVI — fast approximation.

  • Fits a Gaussian guide \(q_\phi(G, w)\) by gradient descent on the ELBO
  • Captures local direction of the valley, not its curvature
  • ~12 min — 20× faster

When gradients break: simulation-based inference

When the simulator is chaotic or a black box, sample anyway.

SBI prior prior p(θ) sim simulator (black box) prior->sim draw θ pairs (θᵢ, yᵢ) samples sim->pairs run nde neural density estimator pairs->nde train post p(θ | y) nde->post query y

Amortized: pay the simulation budget once, query any new \(y\) in milliseconds.

Gradients available → HMC / SVI. Gradients break → SBI.

Choosing a method: Cheat Sheet

method scales to cost (forward sims) what you get
Grid \(\le 3\) params \(N^d\) full landscape
Random / Sobol \(\sim 20\) \(10^2\)\(10^3\) landscape sketch
CMA-ES / GA \(\sim 50\) \(10^3\)\(10^4\) local optimum
Gradient descent \(10^4\)+ (regional) \(10^3\)\(10^5\) local optimum
SVI \(10^4\)+ \(10^4\)\(10^6\) approximate posterior
HMC / NUTS \(10^4\)+, simulator-bound \(10^5\)–??? (asymptotically-)exact posterior
SBI (NPE / NLE) \(\sim 10^2\), simulator-bound \(10^4\)\(10^6\) upfront, amortized amortized approximate posterior

Cost numbers assume reverse-mode AD makes a gradient step ≈ one forward sim. Bayesian rows sit at the expensive end, a posterior is worth roughly a thousand point estimates’ worth of compute.

Chaos eats your gradient

Not a bug in automatic differentiation, the dynamics are telling you the loss is non-smooth at fine scales.

Mitigations

  • integrate over shorter windows
  • smooth observables
  • multiple shooting
  • gradient clipping

Near a bifurcation, the AD gradient (b) diverges from the finite-difference ground truth.

The TVB-Optim workflow

  1. Bifurcation map (§3) → pick a regime of interest
  2. Coarse random / Sobol search within that regime → find a basin
  3. Gradient descent from the basin → polish to a local optimum
  4. Optional: Bayesian inference (HMC, SVI, or SBI when gradients break) from the optimum → posterior + uncertainty

→ each step uses the output of the previous as a warm start.

Use-cases

I — fMRI BOLD FC

Reduced Wong-Wang → match empirical FC.

  • Loss: RMSE on FC matrices
  • Path: grid (G, w) → global gradient fit → per-region w
  • Open notebook

II — MEG Peak Frequency

Jansen-Rit with delays → match the alpha-band gradient.

  • Target: 7 Hz (association) → 11 Hz (visual cortex)
  • Loss: \(1 - \text{mean}(\mathrm{corr})\) across regional PSDs
  • Path: grid (a, b) → per-region (a, b) gradient fit
  • Open notebook

Both notebooks share the same TVB-Optim idioms (Parameter, .shape, Space, @cache) on different observables.

7. Stimulating the brain

Visual evoked potentials

(Jansen and Rit 1995)

Original

Specification

- id: 6
  label: "VEP: different columns, with delay
          (Fig. 11)"
  dynamics:
    name: JansenRit1995_Delayed_VEP
    parameters:
      A: {value: 3.25, unit: mV}
      B: {value: 22, unit: mV}
      C: {value: 135}
      a: {value: 100, unit: s^-1}
      b: {value: 50, unit: s^-1}
      v0: {value: 6, unit: mV}
      e0: {value: 2.5, unit: s^-1}
      r: {value: 0.56, unit: mV^-1}
      p:
        value: 220
        unit: s^-1
        shape: "(n_nodes,)"
        distribution: *dist_p_uniform
      a_d: {value: 30, unit: s^-1}
      K: {value: 0, shape: "(n_nodes,)"}
    derived_parameters: *jr_derived
    functions: *jr_functions
    coupling_terms: *jr_coupling
    state_variables:
      y0: {equation: {rhs: y3}}
      y3:
        equation:
          rhs: >-
            A*a*Sigm(y1 - y2)
            - 2*a*y3 - a**2*y0
      y1: {equation: {rhs: y4}}
      y4:
        equation:
          rhs: >-
            A*a*(p + P
            + C2*Sigm(C1*y0) + z0)
            - 2*a*y4 - a**2*y1
      y2: {equation: {rhs: y5}}
      y5:
        equation:
          rhs: >-
            B*b*(C4*Sigm(C3*y0))
            - 2*b*y5 - b**2*y2
      z0: {equation: {rhs: z1}}
      z1:
        equation:
          rhs: >-
            A*a_d*K*c_intercolumn
            - 2*a_d*z1 - a_d**2*z0
    derived_variables:
      v_pyr: {equation: {rhs: y1 - y2}}
    output: [v_pyr]
  network:
    number_of_regions: 2
    number_of_nodes: 2
    conduction_speed: {value: 0}
    nodes:
      - {id: 0}
      - id: 1
        parameters:
          B: {value: 17.6}
          C: {value: 108}
    edges:
      - source: 0
        target: 1
        directed: true
      - source: 1
        target: 0
        directed: true
    coupling: *sigmoidal_coupling
  events: *flash_events
  integration: *integration_vep
  execution:
    random_seed: 42
  explorations:
    VEP_delayed_fig11:
      mode: zip
      n_trials: 40
      average: trials
      parameters:
        K:
          element_domains:
            - element: 0
              explored_values: [300, 100]
            - element: 1
              explored_values: [1000, 4000]

Reproduced

Stimulation with Bayesian inference

Hands-on notebook: 5_stimulation_with_bayesian_inference.qmd

  • Recover stimulus amplitude and excitability from a noisy recording
  • See the degeneracy ridge in the loss landscape
  • Use priors as hypotheses to pick a mechanistic explanation
  • Compare Bayesian posteriors against multi-start gradient descent

End of Day 1

         \(^_^)/

Day 2 — Mini Hackathon and Plot Your Brain Tutorial

Hackathon

  • pitch your question (5 min each)
  • form small groups
  • checkpoint + wrap-up times

Bring your own…

  • data formats we can support

Brain Plotting with bsplot

uv pip install bsplot

Wrap-up

Acknowledgements

Christoph Huettl

Rico Schmitt

Lia Domide

Romina Baila

Jil Meier

Halgurd Taher

Konstantin Buelau

Dionysios Perdikis

Michael Schirner

Leon Stefanovski

Claude Code

Petra Ritter

Thank You & Feedback

Your Feedback

Marius Pille
marius.pille [at] bih-charite.de

Leon Martin
leon.martin [at] bih-charite.de

Leon Stefanovski
leon.stefanovski [at] charite.de

Charite logo BIH logo BSS logo

References

Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (null): 281–305. https://dl.acm.org/doi/abs/10.5555/2188385.2188395.
Jansen, Ben H., and Vincent G. Rit. 1995. “Electroencephalogram and Visual Evoked Potential Generation in a Mathematical Model of Coupled Cortical Columns.” Biological Cybernetics 73 (4): 357–66. https://doi.org/10.1007/bf00199471.
Koller, Dominik P., Michael Schirner, and Petra Ritter. 2024. “Human Connectome Topology Directs Cortical Traveling Waves and Shapes Frequency Gradients.” Nature Communications 15 (1). https://doi.org/10.1038/s41467-024-47860-x.
Kuramoto, Yoshiki. 1975. “Self-Entrainment of a Population of Coupled Non-Linear Oscillators.” In Lecture Notes in Physics, 420–22. Springer-Verlag. https://doi.org/10.1007/bfb0013365.
Martin, Leon, Konstantin Buelau, Marius Pille, Rico Andre Schmitt, Christoph Huettl, Jil M Meier, Halgurd Taher, Dionysios Perdikis, Michael Schirner, and Petra Ritter. 2025. “The Virtual Brain Ontology: A Digital Knowledge Framework for Reproducible Brain Network Modeling.” bioRxiv, November. https://doi.org/10.1101/2025.11.19.689211.
Pille, Marius, Leon Martin, Emilius Richter, Dionysios Perdikis, Michael Schirner, and Petra Ritter. 2025. “Fast and Easy Whole-Brain Network Model Parameter Estimation with Automatic Differentiation.” bioRxiv, November. https://doi.org/10.1101/2025.11.18.689003.
Schirner, Michael, Gustavo Deco, and Petra Ritter. 2023. “Learning How Network Structure Shapes Decision-Making for Bio-Inspired Computing.” Nature Communications 14 (1). https://doi.org/10.1038/s41467-023-38626-y.
The Turing Way Community, and Scriberia. 2024. “Illustrations from the Turing Way: Shared Under CC-BY 4.0 for Reuse.” Zenodo. https://doi.org/10.5281/zenodo.13882307.