Towards translational and mechanistic whole-brain simulations with The Virtual Brain

Track 4 Workshop — FAIR Brain Data Science Bootcamp

Leon Stefanovski

Marius Pille

Leon Martin

May 12, 2026

Welcome

Two-day hands-on workshop on personalized whole-brain modeling with TVB
Day 1: structured sessions
Day 2: mini hackathon — bring your own questions / data
Useful links:

Schedule

Day 1
Block 1 - 09:00-12:00:
1. Intro to Brain Network Modelling
2. Defining a FAIR brain network model
3. Setup & First examples
4. Applications in Basic and Clinical Research

Block 2 - 13:00-15:00:
5. Bifurcation Theory
6a. Parameter exploration & inference I

Block 3 - 15:30-17:30:
6b. Parameter exploration & inference II
7. Stimulating the brain

Day 2
Block 1 - 09:00-11:30:
Mini hackathon - Bring your own data / research question
Plotting the Brain

Block 2 - 12:00-13:30:
Mini hackathon - Bring your own data / research question
Plotting the Brain

Workshop Website

virtual-twin.github.io/Brain-Simulation-Workshop

1. Intro to Brain Network Modelling

Why network models? Same lesion, different outcomes

Two patients, similar lesion location and volume
Different deficits, different recovery trajectories
The locus alone doesn’t predict outcome
Function lives in the network: regional damage propagates through the connections that survive

Same lesion, same network position, but a hub lesion (left, degree 6) wipes out most of the network’s input, while a peripheral lesion (right, degree 1) barely touches it.

To predict outcome, model the whole network, not a catalogue of regions.

Why dynamics? The brain is never idle

At rest, the brain shows structured spontaneous activity, not noise
Resting-state networks are reproducible across subjects and species
Tasks are perturbations of this ongoing dynamical state
Like a tennis player at the baseline: never still, poised for the next service

A model has to reproduce the resting repertoire, not only stimulus responses.

Resting activity for 8 regions in 3 modules. Top: traces wiggle in sync within each module, decoupled across them. Bottom: the resulting FC matrix shows three clear blocks.

Outputs match what we measure

TVB outputs what scanners record: EEG · MEG · SEEG · fMRI BOLD
Each modality is a forward model (\(h\)) on the same simulated state
Timescales come for free, EEG/SEEG in ms, BOLD in seconds

Direct comparison with data, no translation gap.

One simulated state, three observables. Top: EEG/MEG-like fast oscillations. Middle: SEEG with sharp transients. Bottom: BOLD, the slow envelope through the HRF.

A short history of TVB

From scans to a simulator: the TVB recipe

Ingredient	Symbol	Source
Connectome (weights, delays)	\(A_{ij},\ \tau_{ij}\)	DTI / tractography
Local dynamics	\(f\)	Neural mass / mean-field model
Coupling	\(g\)	Network propagation, with delays
Observation	\(h\)	BOLD / EEG / MEG forward model

The next slides unpack each building block.

Building Block 1: The Connectome

Diffusion MRI → tractography → parcellation → connectivity matrix. The output is the structural backbone \(A_{ij}\) (weights) and \(\tau_{ij}\) (delays).

Building Block 2: Dynamical Systems

A mass \(m\) on a spring with stiffness \(k\).
Perpetual oscillation at \(\omega = \sqrt{k/m}\).

State Equations: \[ \dot{x} = v \] \[ \dot{v} = - \frac{k}{m}*x \]

State Variables

Variable	Initial Value	Unit	Description
\(x\)	2.0	\(\mathrm{m}\)	Displacement from equilibrium
\(v\)	0.0	\(\frac{\mathrm{m}}{\mathrm{s}}\)	Velocity

Parameters

Parameter	Value	Unit	Description
\(k\)	0.0001	\(\frac{\mathrm{N}}{\mathrm{m}}\)	Spring stiffness constant
\(m\)	1.0	\(\mathrm{kg}\)	Mass of the oscillating body

Initial Conditions

Same spring-mass system as before.

What changes?

Initial displacement only: \(x_0 \in \{-1,-2,-3\}\)
Velocity stays fixed at \(v_0 = 0\)
Parameters stay fixed: \(k=10^{-4},\; m=1\)

What do we see?

Same frequency in all cases
Larger initial stretch gives larger amplitude
Closed motion stays perfectly periodic

Phase Space

Same spring-mass system. Only the representation changes.

What changes?

Plot state as \((x,v)\) instead of \(x(t)\)
Use several initial conditions
Add the vector field of the dynamics

What do we see?

Closed trajectories around equilibrium
Conservative dynamics with no damping
Geometry of the flow instead of time evolution

Parameters — Mass

Same equations and same initial condition. Only the mass changes.

What changes?

\(m \in \{0.5, 1.0, 2.0\}\)
\(x_0=-2\), \(v_0=0\) stay fixed
\(k\) stays fixed

What do we see?

Frequency follows \(\omega = \sqrt{k/m}\)
Heavier masses oscillate more slowly
Box size encodes the mass value visually

Towards Realism — Damping & Load

Now we modify the toy model toward a more realistic oscillator.

What changes?

\[ \dot{v} = -\frac{k}{m}x - \frac{c}{m}v + g_{\mathrm{eff}} \]

Add damping: \(c=0.006\)
Add constant load: \(g_{\mathrm{eff}}=0.00015\)

What do we see?

Oscillations decay over time
Equilibrium shifts away from zero
Compare ideal toy model vs damped + load

Phase Space: Damped Spring

Spring Oscillator

Damped Spring

Towards Brain Dynamics: Mean Field Models

Mean field models & Neural Masses

Population rate: \(r(t) = \langle \#\text{spikes}\rangle / (N\,\Delta t)\)
Jansen-Rit (3 populations): PC = \(y_1 - y_2\), the EEG-like PSP at pyramidal cells

EEG-Like Activity with the Jansen-Rit-Model

JansenRit1995 Lumped-parameter cortical column model with three neural populations: pyramidal cells, excitatory interneurons, and inhibitory interneurons. Each population is modeled by a PSP block (2nd-order linear filter) and a sigmoidal potential-to-firing-rate transformation.

Derived Variables \[ v_{pyr} = y_{1} - y_{2} \]

State Equations \[ \dot{y_{0}} = y_{3} \] \[ \dot{y_{3}} = - y_{0}*a^{2} - 2*a*y_{3} + A*a*\operatorname{Sigm}{\left(y_{1} - y_{2} \right)} \] \[ \dot{y_{1}} = y_{4} \] \[ \dot{y_{4}} = - y_{1}*a^{2} - 2*a*y_{4} + A*a*\left(p + C_{2}*\operatorname{Sigm}{\left(C_{1}*y_{0} \right)} + K*c_{intercolumn}\right) \] \[ \dot{y_{2}} = y_{5} \] \[ \dot{y_{5}} = - y_{2}*b^{2} - 2*b*y_{5} + B*C_{4}*b*\operatorname{Sigm}{\left(C_{3}*y_{0} \right)} \]

with \[ \operatorname{Sigm}{\left(v \right)} = \frac{2*e_{0}}{1 + e^{r*\left(v_{0} - v\right)}} \]

State Variables

Variable	Initial Value	Unit
\(y_{0}\)	0.1	\(\mathrm{mV}\)
\(y_{3}\)	0.1	\(\frac{\mathrm{mV}}{\mathrm{s}}\)
\(y_{1}\)	0.1	\(\mathrm{mV}\)
\(y_{4}\)	0.1	\(\frac{\mathrm{mV}}{\mathrm{s}}\)
\(y_{2}\)	0.1	\(\mathrm{mV}\)
\(y_{5}\)	0.1	\(\frac{\mathrm{mV}}{\mathrm{s}}\)

Parameters

Parameter	Value	Unit	Description
\(A\)	3.25	\(\mathrm{mV}\)	Maximum amplitude of excitatory PSP (EPSP)
\(B\)	22	\(\mathrm{mV}\)	Maximum amplitude of inhibitory PSP (IPSP)
\(C\)	135	—	Lumped connectivity constant (C = C1)
\(a\)	100	\(\mathrm{s}^{-1}\)	Reciprocal of EPSP time constant
\(b\)	50	\(\mathrm{s}^{-1}\)	Reciprocal of IPSP time constant
\(v_{0}\)	6	\(\mathrm{mV}\)	PSP for 50% firing rate (sigmoid midpoint)
\(e_{0}\)	2.5	\(\mathrm{s}^{-1}\)	Half of maximum firing rate
\(r\)	0.56	\(\mathrm{mV}^{-1}\)	Steepness of sigmoid transformation
\(p\)	220	\(\mathrm{s}^{-1}\)	Mean external input (afferent pulse density)
\(K\)	0	—	Per-node coupling strength scaling (K[i] scales coupling received by node i)

Derived Parameters

\[ C_{1} = C \] \[ C_{2} = 0.8*C \] \[ C_{3} = 0.25*C \] \[ C_{4} = 0.25*C \]

Building Block 3: Coupling → Network Dynamics

Uncoupled system

Kuramoto (1975)

Phase oscillator model. Each node evolves with its own natural frequency \(\omega\); coupling \(I\) is added by the network.

State Equations \[\dot{\theta} = \omega\]

State Variables

Variable	Initial value	Unit
\(\theta\)	\(\sim\mathcal{U}(0,2\pi)\)	—

Derived Variables \[activity = \sin(\theta)\]

Parameters

Parameter	Value	Unit	Description
\(\omega\)	1.0	—	Natural frequency

Simulation settings

Duration	100 s
Step size	0.01 s
Integrator	Heun
Nodes	3 (V1, V2, MT)
Coupling \(K\)	0.0 (uncoupled)

Coupled system

Kuramoto (1975) — same model as before, now with coupling enabled.

State Equations \[\dot{\theta} = I + \omega\]

Coupling: \(I\)

\[ I = \frac{K \sum_{j=0}^{-1 + N} - {g}_{i,j} \sin{\left(\theta_{i} - \theta_{j} \right)}}{3} \]

Receives \(\theta\) from connected regions.

Pre-synaptic: \(c_{\text{pre}} = \sin(\theta_j - \theta_i)\)

Post-synaptic: \(c_{\text{post}} = \dfrac{K \cdot gx}{3}\)

Coupling parameters

Parameter	Value	Description
\(K\)	0.4	Global coupling strength

Network

Edge	Weight
V1 → V2	0.7
V2 → MT	0.4
V1 → MT	0.2

Simulation settings

Duration	100 s
Step size	0.01 s
Integrator	Heun

Mathematical framework

\[dS_i = \left[f_d(S_i, \theta^d, C_i, I_i) \right]dt + g(S_i, \theta^g)\, dW_i\]

State Evolution

\(S_i\) - state variables at node \(i\)
\(f_d\) - dynamics function with parameters \(\theta^d\)
\(C_i\) - coupling input from connected nodes
\(I_i\) - external input (stimulation, driving signals)
\(g\) - noise diffusion coefficient with parameters \(\theta^g\)
\(dW_i\) - Wiener process (stochastic fluctuations)


============================================================
STEP 1: Running simulation...
============================================================
  Simulation period: 10.0 ms, dt: 0.01220703125 ms
  Transient period: 0.0 ms
  Simulation complete.

============================================================
Experiment complete.
============================================================

Coupled Brain Network

\[ C_i = f_c^{\text{post}}\Big(\sum_j A_{ij}\, f_c^{\text{pre}}(S_i, S_j(t-\tau_{ij}), \theta^c), S_i, \theta^c\Big) \]

Coupling

\(f_c^{\text{pre}}\) - pre-aggregation transformation
\(A_{ij}\) - structural connectivity weight (\(j \to i\))
\(\tau_{ij}\) - transmission delay (tract length / conduction speed)
\(f_c^{\text{post}}\) - post-aggregation transformation
\(\theta^c\) - coupling parameters

2. Defining a FAIR brain network model

FAIR Guiding Principles

(The Turing Way Community and Scriberia 2024)

FAIR Principles

(The Turing Way Community and Scriberia 2024)

A Simulation Experiment in Computational Neuroscience

SimulationExperiment

Network

Parcellation ⟶ Node ⟶ Edge

Dynamics

StateVariable Parameter Equation DerivedVariable DerivedParameter Function

Coupling

Parameter Equation IncomingState LocalState

Integration

Method Noise InitialCondition TimeSeries

Event

Equation Target

Stimulus

Waveform

Observation

Source Pipeline ⟶ DataSource DerivedObs.

Exploration

Exploration Algorithm

Analysis

Optimization Continuation

Environment

SoftwareEnv. Execution Reference Author

Model Specification with TVB-O

Metadata Schema

id: 21
model: Dynamics
dynamics:
  name: Dynamics
  parameters:
    a:
      name: a
      value: 1.0
  state_variables:
    x:
      name: x
      equation:
        rhs: a*x
      initial_value: 1.0
integration:
  method: Heun
  time_scale: ms
  duration: 4.0
  step_size: 0.01
explorations:
  name: a_sweep
  space:
  - parameter: a
    explored_values:
    - 1.0
    - 0.0
    - -1.0

Python API

from tvbo import DynamicalSystem, SimulationExperiment
from tvbo.datamodel.schema import Exploration, ExplorationAxis

bsplot.style.use("tvbo")

g = DynamicalSystem(
    state_variables={"x": {
                          "equation": {"rhs": "a*x"},
                          "initial_value": 1.0}},
    parameters={"a": {"value": 1.0}},
)

exp = SimulationExperiment(dynamics=g)

exp.explorations = Exploration(
    name="a_sweep",
    space=ExplorationAxis(
      parameter="a",
      explored_values=[1.0, 0, -1.0]
      ),
)

exp.integration.duration = 4.0
exp.integration.step_size = 0.01

res = exp.run("tvboptim")

Specifying an experiment with TVB-O

What makes specifying brain network models easier?

Common vocabulary: Ontology-backed definitions link terms to persistent identifiers — no more ambiguous symbols across papers
Machine-readable metadata: Compact YAML schema captures everything needed to reproduce a simulation
Curated database: Published models, brain networks, and study configurations — ready to load and run
Code generation: One specification → executable code for TVB, JAX, Julia, and more

Applications in basic research

(Schirner, Deco, and Ritter 2023)

(Koller, Schirner, and Ritter 2024)

3. Setup & First Examples

See setup

4. Clinical Applications

5. Bifurcation Analysis

How to analyse a brain system

Time series \(x(t)\) at fixed \((a, x_0)\)

Phase plane flow geometry at fixed \(a\)

Bifurcation diagram regimes as \(a\) varies

Linear Systems

\[ \dot x = a\,x \quad\Rightarrow\quad x(t) = x_0\,e^{a t} \]

\(x(t)\): State variable
\(x_0\): Initial condition
\(a\): Parameter

Solution: \[ x(t) = x_0\,e^{a t} \]

Equilibrium: \[ \dot{x}(t) = ax(t) = 0? \] \[ \quad\Rightarrow\quad x_{eq} = 0 \]

2D portraits

Regime-change building blocks

Top: equilibria as \(a\) varies.
Bottom: local flow \(\dot x=f(x)\) at the dotted parameter slice.

Hopf: birth of an oscillation

\[ \dot z = (a + i\omega)z - |z|^2z, \qquad r_{\mathrm{cycle}} = \sqrt a \]

fixed point loses stability \(\Rightarrow\) stable oscillation appears

Dynamical Regimes

Numerical continuation

\[ F(x, a)=0, \qquad (x,a)=(x(s),a(s)) \]

follow branches through folds
detect folds and Hopf points
switch to periodic orbits

TVB simulation infused with bifurcation analysis

TVBO + tvboptim simulation of Generic2dOscillator on the Desikan-Killiany network. Nodes are colored by their instantaneous \(V(t)\) and projected onto axial (top) and sagittal views; the regime map places each node at its current \((I_{\mathrm{eff}}, V)\) on the bifurcation diagram.

Bifurcation analysis: hands-on

Notebook: notebooks/4_BifurcationAnalysis/BifurcationAnalysis.qmd
Sweep input \(I\) in a TVB neural mass model
Continue fixed points and periodic orbits
Read the diagram as a map of regimes

result = exp.run("bifurcationkit.jl")

6.. Parameter exploration and inference

Models as computational hypotheses

A brain network model is a hypothesis with knobs. Picking knob values is how we test the hypothesis against data.

\(\theta\): model parameters (the knobs)
\(\hat{y}(\theta)\): simulated observable (FC, power spectrum, BOLD …)
\(y_{\text{obs}}\): empirical observable / data
\(\mathcal{L}\): loss / discrepancy

What is inference for brain models?

Inference: which \(\theta\) are consistent with the data?

What do we need to do?

Define a loss function \(\mathcal{L}(\theta)\)
Test many \(\theta\) configurations on \(\mathcal{L}(\theta)\) to find the “best”

Why inference is hard for brain models?

Expensive simulator: seconds to minutes per run
High-dimensional: when parameters become regional (\(\times N\) nodes)
Non-smooth dynamics: chaos, stochasticity, stiffness
Bumpy observables: FC, FCD, PSD are summary statistics
Parameter degeneracy: many parameter combinations fit about equally well

Every method is a bet about which of these difficulties dominates.

What makes inference for brain models easier?

Faster simulator: Parallelization and Accelerators (GPUs, TPUs, etc.)
Differentiability: Gradient-based optimization
Good Workflows: Building complexity gradually

The example problem: fMRI functional connectivity

Find global coupling \(G\) and recurrence \(w\) so simulated FC matches empirical FC in a Reduced Wong-Wang model on the Desikan-Killiany parcellation.

Optimization over a parameter landscape:
\(\mathcal{L}(\theta) = \mathrm{RMSE}(\hat{FC}(\theta),\, FC_{\text{obs}})\).

two knobs → still easy to picture
note the low-loss valley: many \((w, G)\) pairs fit about equally well → a degeneracy typical of brain network models

Defining the running example in TVB-Optim

TVB-Optim: A JAX based toolbox for whole-brain simulation and inference (Pille et al. 2025)

# 1. Compose the brain network model from graph, dynamics, coupling, noise
graph    = DenseGraph(weights, region_labels=region_labels)
dynamics = ReducedWongWang(w=0.5, I_o=0.32, INITIAL_STATE=(0.3,))
coupling = FastLinearCoupling(local_states=["S"], G=0.5)
noise    = AdditiveNoise(sigma=0.00283, apply_to="S")
network  = Network(dynamics=dynamics, coupling={"instant": coupling},
                   graph=graph, noise=noise)

# 2. prepare() splits what to compute (`model`, pure & jittable) from what to plug in (`state`)
model, state = prepare(network, Heun(), t1=120_000, dt=4.0)

# 3. Observation + loss are plain Python functions of `state`
# Every algorithm in this section calls the same loss(s); only how `state` is varied changes
def observation(state):
    return compute_fc(bold_monitor(model(state)), skip_t=20)

def loss(state):
    return rmse(observation(state), fc_target)

Parallelism: `jax.vmap` and `jax.pmap` under the hood

# Explore a 10x10 = 100 grid over (theta_1, theta_2)
state.coupling.theta_1 = GridAxis(-1.0, 1.0, 10)
state.dynamics.theta_2 = GridAxis(-1.0, 1.0, 10)
losses = ParallelExecution(loss, Space(state, mode="product"), n_pmap=8).run()

Grid search: map the whole landscape

Same idea as the bifurcation diagrams from §3
Wins when: \(\le 2\)–\(3\) parameters, you want to see the landscape
Cost grows as \(N^d\) — unusable beyond a handful of dimensions

# Explore a 32x32 grid space of w and G
grid_state.dynamics.w = GridAxis(0.001, 0.7, 32)
grid_state.coupling.G = GridAxis(0.001, 0.7, 32)
grid = Space(grid_state, mode="product")
losses = ParallelExecution(loss, grid, n_pmap=8).run()

Random search: the surprising upgrade

In more than 2–3 dimensions, random search beats grid for the same budget.

Why? Real loss surfaces have low effective dimensionality, most parameters barely matter. Grid spends its budget evenly across all axes; random samples every axis at every trial (Bergstra and Bengio 2012).

Grid vs. Random search, taken from (Bergstra and Bengio 2012) figure 1

# Explore a 100 samples space of w and G
random_state.dynamics.w = UniformAxis(0.001, 0.7, 100)
random_state.coupling.G = UniformAxis(0.001, 0.7, 100)
samples = Space(random_state, mode="zip", key=jax.random.key(42))
losses = ParallelExecution(loss, samples, n_pmap=8).run()

Random search onto 2D loss surface. 1/10th the evaluation budget compared to grid search.

Quasi-random Sobol: Sensitivity for free

Sobol / Latin Hypercube sequences: low-discrepancy, drop-in replacement for the uniform sampler
Same design powers variance-based sensitivity analysis: Saltelli + Sobol indices quantify which knobs the loss actually depends on

problem = {"num_vars": 3,
           "names":  ["G", "w", "sigma"],
           "bounds": [[0.001, 0.7], [0.001, 0.7], [0.001, 0.01]]}
samples = sobol_sample.sample(problem, 256)   # Saltelli design

state.coupling.G  = DataAxis(samples[:, 0])
state.dynamics.w  = DataAxis(samples[:, 1])
state.noise.sigma = DataAxis(samples[:, 2])
losses = ParallelExecution(loss, Space(state, mode="zip")).run()
Si = sobol_analyze.analyze(problem, np.array(losses))

\(S_T(\sigma) \approx 0\)
\(S_T(G) \approx 0.93\)
\(S_T(w) \approx 0.49\)
\(S_2(G\times w) = 0.46\)

Take-away: the FC loss lives on an effective 2D subspace.

Evolutionary Approaches: adaptive, no gradients

Maintain a population, update a mean and covariance to bias future samples
Naturally parallel: every generation is one DataAxis evaluation
No gradients needed → works on noisy / discontinuous losses
Wins when: ~10–50 parameters

es = cma.CMAEvolutionStrategy([0.05, 0.6], 0.15,
  {"popsize": 16})
while not es.stop():
    pop = np.array(es.ask())
    s.coupling.G = DataAxis(pop[:, 0])
    s.dynamics.w = DataAxis(pop[:, 1])
    losses = ParallelExecution(
        loss, Space(s, mode="zip")).run()
    es.tell(pop.tolist(), np.array(losses).tolist())

CMA-ES on (G, w). **Dots:** population of 16 candidates per generation. **Red ×:** distribution mean. **Red ellipse:** 2-σ covariance. The ellipse rotates onto the valley.

Hitting the wall: why sampling stops scaling

Everything so far - grid, random / Sobol, Genetic algorithms - searches by sampling the simulator.

But a brain network model with regional parameters has \(d \sim 10^2\)–\(10^4\) knobs (e.g. one \(w_i\) per Desikan-Killiany region, plus per-region noise, plus …). No sampling budget reaches that regime.

Automatic differentiation: Gradients (almost) for free

Run forward once, run backward once. Reverse-mode AD sweeps the computation graph in reverse, accumulating \(\partial\mathcal{L}/\partial\theta_i\) for every \(i\) at the same time → \(\text{cost}(\nabla_\theta \mathcal{L}) \approx 3\text{–}30 \times \text{cost}(\mathcal{L})\).

Gradient cost in tvboptim, normalized by one simulation. Reverse-mode (blue) is flat at \(\sim 20\times\) across \(d \in [1, 10^4]\), independent of dimension.

In JAX: jax.grad(loss). One line, gradients with respect to every knob — what made per-region fits tractable.

Gradient descent: The regime-changer

\[ \theta_{t+1} \;=\; \theta_t - \eta \,\nabla_\theta\, \mathcal{L}(\theta_t) \]

The cheapest way to scale to regional / per-node parameter vectors (hundreds–thousands of knobs)
Local: needs a reasonable starting point
Point estimate: Only finds a single good fit

state.dynamics.w = Parameter(state.dynamics.w)
state.coupling.G = Parameter(state.coupling.G)
opt = OptaxOptimizer(loss, optax.adam(0.01))
fitted, _ = opt.run(state, max_steps=200)

Gradient descent on (G, w). White path: 200 Adam iterates from a deliberately off-valley start into the low-loss valley. One trajectory, one optimum → gradient descent finds a good fit cheaply but does not characterize the valley itself.

Bayesian inference — when you want a posterior

A point estimate \(\theta^\star\) tells you one good fit. A posterior tells you all fits consistent with the data.

\[ \underbrace{p(\theta \mid y)}_{\text{posterior}} \;\propto\; \underbrace{p(y \mid \theta)}_{\text{likelihood}} \;\cdot\; \underbrace{p(\theta)}_{\text{prior}} \]

Priors encode what you (already) know

A prior \(p(\theta)\) is a probabilistic statement about parameters before you see this dataset. Three regimes:

Flat / uninformative: Uniform over a physiological range. “I know the bounds, nothing else.” Posterior is dominated by data.
Weakly informative: Normal around a literature value with generous spread. Regularizes, prevents pathological fits.
Strongly informative: derived from independent measurements: anatomy, tau / amyloid PET maps, receptor density, EEG power. “This patient’s region has elevated tau” → prior on local excitation per region.

Same simulator, same data, different priors → different posteriors. The prior is where domain knowledge enters the inference, cleanly and auditably.

HMC vs SVI: same model, two budgets

Same model, same wide priors (\(G, w \sim \mathcal{U}(0.001, 0.7)\)). Different inference algorithm.

HMC (left) and SVI (right) on the same model. Background: grid-scan loss landscape. Red points: posterior samples; red contours: KDE; histograms: marginals. HMC bends along the valley; the SVI Gaussian aligns with it locally but cannot curve.

HMC / NUTS — gold standard.

Asymptotically exact samples; recovers the curved ridge
~4 h for 300 samples · simulator-bound

SVI — fast approximation.

Fits a Gaussian guide \(q_\phi(G, w)\) by gradient descent on the ELBO
Captures local direction of the valley, not its curvature
~12 min — 20× faster

When gradients break: simulation-based inference

When the simulator is chaotic or a black box, sample anyway.

Amortized: pay the simulation budget once, query any new \(y\) in milliseconds.

Gradients available → HMC / SVI. Gradients break → SBI.

Choosing a method: Cheat Sheet

method	scales to	cost (forward sims)	what you get
Grid	\(\le 3\) params	\(N^d\)	full landscape
Random / Sobol	\(\sim 20\)	\(10^2\)–\(10^3\)	landscape sketch
CMA-ES / GA	\(\sim 50\)	\(10^3\)–\(10^4\)	local optimum
Gradient descent	\(10^4\)+ (regional)	\(10^3\)–\(10^5\)	local optimum
SVI	\(10^4\)+	\(10^4\)–\(10^6\)	approximate posterior
HMC / NUTS	\(10^4\)+, simulator-bound	\(10^5\)–???	(asymptotically-)exact posterior
SBI (NPE / NLE)	\(\sim 10^2\), simulator-bound	\(10^4\)–\(10^6\) upfront, amortized	amortized approximate posterior

Cost numbers assume reverse-mode AD makes a gradient step ≈ one forward sim. Bayesian rows sit at the expensive end, a posterior is worth roughly a thousand point estimates’ worth of compute.

Chaos eats your gradient

Not a bug in automatic differentiation, the dynamics are telling you the loss is non-smooth at fine scales.

Mitigations

integrate over shorter windows
smooth observables
multiple shooting
gradient clipping

Near a bifurcation, the AD gradient (b) diverges from the finite-difference ground truth.

The TVB-Optim workflow

Bifurcation map (§3) → pick a regime of interest
Coarse random / Sobol search within that regime → find a basin
Gradient descent from the basin → polish to a local optimum
Optional: Bayesian inference (HMC, SVI, or SBI when gradients break) from the optimum → posterior + uncertainty

→ each step uses the output of the previous as a warm start.

Use-cases

I — fMRI BOLD FC

Reduced Wong-Wang → match empirical FC.

Loss: RMSE on FC matrices
Path: grid (G, w) → global gradient fit → per-region w
Open notebook

II — MEG Peak Frequency

Jansen-Rit with delays → match the alpha-band gradient.

Target: 7 Hz (association) → 11 Hz (visual cortex)
Loss: \(1 - \text{mean}(\mathrm{corr})\) across regional PSDs
Path: grid (a, b) → per-region (a, b) gradient fit
Open notebook

Both notebooks share the same TVB-Optim idioms (Parameter, .shape, Space, @cache) on different observables.

7. Stimulating the brain

Visual evoked potentials

(Jansen and Rit 1995)

Original

Specification

- id: 6
  label: "VEP: different columns, with delay
          (Fig. 11)"
  dynamics:
    name: JansenRit1995_Delayed_VEP
    parameters:
      A: {value: 3.25, unit: mV}
      B: {value: 22, unit: mV}
      C: {value: 135}
      a: {value: 100, unit: s^-1}
      b: {value: 50, unit: s^-1}
      v0: {value: 6, unit: mV}
      e0: {value: 2.5, unit: s^-1}
      r: {value: 0.56, unit: mV^-1}
      p:
        value: 220
        unit: s^-1
        shape: "(n_nodes,)"
        distribution: *dist_p_uniform
      a_d: {value: 30, unit: s^-1}
      K: {value: 0, shape: "(n_nodes,)"}
    derived_parameters: *jr_derived
    functions: *jr_functions
    coupling_terms: *jr_coupling
    state_variables:
      y0: {equation: {rhs: y3}}
      y3:
        equation:
          rhs: >-
            A*a*Sigm(y1 - y2)
            - 2*a*y3 - a**2*y0
      y1: {equation: {rhs: y4}}
      y4:
        equation:
          rhs: >-
            A*a*(p + P
            + C2*Sigm(C1*y0) + z0)
            - 2*a*y4 - a**2*y1
      y2: {equation: {rhs: y5}}
      y5:
        equation:
          rhs: >-
            B*b*(C4*Sigm(C3*y0))
            - 2*b*y5 - b**2*y2
      z0: {equation: {rhs: z1}}
      z1:
        equation:
          rhs: >-
            A*a_d*K*c_intercolumn
            - 2*a_d*z1 - a_d**2*z0
    derived_variables:
      v_pyr: {equation: {rhs: y1 - y2}}
    output: [v_pyr]
  network:
    number_of_regions: 2
    number_of_nodes: 2
    conduction_speed: {value: 0}
    nodes:
      - {id: 0}
      - id: 1
        parameters:
          B: {value: 17.6}
          C: {value: 108}
    edges:
      - source: 0
        target: 1
        directed: true
      - source: 1
        target: 0
        directed: true
    coupling: *sigmoidal_coupling
  events: *flash_events
  integration: *integration_vep
  execution:
    random_seed: 42
  explorations:
    VEP_delayed_fig11:
      mode: zip
      n_trials: 40
      average: trials
      parameters:
        K:
          element_domains:
            - element: 0
              explored_values: [300, 100]
            - element: 1
              explored_values: [1000, 4000]

Reproduced

Stimulation with Bayesian inference

Hands-on notebook: 5_stimulation_with_bayesian_inference.qmd

Recover stimulus amplitude and excitability from a noisy recording
See the degeneracy ridge in the loss landscape
Use priors as hypotheses to pick a mechanistic explanation
Compare Bayesian posteriors against multi-start gradient descent

End of Day 1

         \(^_^)/

Day 2 — Mini Hackathon and Plot Your Brain Tutorial

Hackathon

pitch your question (5 min each)
form small groups
checkpoint + wrap-up times

Bring your own…

data formats we can support

Brain Plotting with `bsplot`

uv pip install bsplot

Wrap-up

Acknowledgements

Christoph Huettl

Rico Schmitt

Lia Domide

Romina Baila

Jil Meier

Halgurd Taher

Konstantin Buelau

Dionysios Perdikis

Michael Schirner

Leon Stefanovski

Claude Code

Petra Ritter

Thank You & Feedback

Marius Pille
marius.pille [at] bih-charite.de

Leon Martin
leon.martin [at] bih-charite.de

Leon Stefanovski
leon.stefanovski [at] charite.de

References

Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (null): 281–305. https://dl.acm.org/doi/abs/10.5555/2188385.2188395.

Jansen, Ben H., and Vincent G. Rit. 1995. “Electroencephalogram and Visual Evoked Potential Generation in a Mathematical Model of Coupled Cortical Columns.” Biological Cybernetics 73 (4): 357–66. https://doi.org/10.1007/bf00199471.

Koller, Dominik P., Michael Schirner, and Petra Ritter. 2024. “Human Connectome Topology Directs Cortical Traveling Waves and Shapes Frequency Gradients.” Nature Communications 15 (1). https://doi.org/10.1038/s41467-024-47860-x.

Kuramoto, Yoshiki. 1975. “Self-Entrainment of a Population of Coupled Non-Linear Oscillators.” In Lecture Notes in Physics, 420–22. Springer-Verlag. https://doi.org/10.1007/bfb0013365.

Martin, Leon, Konstantin Buelau, Marius Pille, Rico Andre Schmitt, Christoph Huettl, Jil M Meier, Halgurd Taher, Dionysios Perdikis, Michael Schirner, and Petra Ritter. 2025. “The Virtual Brain Ontology: A Digital Knowledge Framework for Reproducible Brain Network Modeling.” bioRxiv, November. https://doi.org/10.1101/2025.11.19.689211.

Pille, Marius, Leon Martin, Emilius Richter, Dionysios Perdikis, Michael Schirner, and Petra Ritter. 2025. “Fast and Easy Whole-Brain Network Model Parameter Estimation with Automatic Differentiation.” bioRxiv, November. https://doi.org/10.1101/2025.11.18.689003.

Schirner, Michael, Gustavo Deco, and Petra Ritter. 2023. “Learning How Network Structure Shapes Decision-Making for Bio-Inspired Computing.” Nature Communications 14 (1). https://doi.org/10.1038/s41467-023-38626-y.

The Turing Way Community, and Scriberia. 2024. “Illustrations from the Turing Way: Shared Under CC-BY 4.0 for Reuse.” Zenodo. https://doi.org/10.5281/zenodo.13882307.

Towards translational and mechanistic whole-brain simulations with The Virtual Brain

Welcome

Schedule

Workshop Website

1. Intro to Brain Network Modelling

Why network models? Same lesion, different outcomes

Why dynamics? The brain is never idle

Outputs match what we measure

A short history of TVB

From scans to a simulator: the TVB recipe

Building Block 1: The Connectome

Building Block 2: Dynamical Systems

Initial Conditions

Phase Space

Parameters — Mass

Towards Realism — Damping & Load

Phase Space: Damped Spring

Towards Brain Dynamics: Mean Field Models

Mean field models & Neural Masses

EEG-Like Activity with the Jansen-Rit-Model

Building Block 3: Coupling → Network Dynamics

Uncoupled system

Coupled system

Mathematical framework

Coupled Brain Network

2. Defining a FAIR brain network model

FAIR Guiding Principles

FAIR Principles

A Simulation Experiment in Computational Neuroscience

A Simulation Experiment in Computational Neuroscience

Model Specification with TVB-O

Specifying an experiment with TVB-O

What makes specifying brain network models easier?

Applications in basic research

3. Setup & First Examples

4. Clinical Applications

5. Bifurcation Analysis

How to analyse a brain system

Linear Systems

2D portraits

2D portraits

Regime-change building blocks

Hopf: birth of an oscillation

Dynamical Regimes

Numerical continuation

TVB simulation infused with bifurcation analysis

Bifurcation analysis: hands-on

6.. Parameter exploration and inference

Models as computational hypotheses

What is inference for brain models?

Why inference is hard for brain models?

What makes inference for brain models easier?

The example problem: fMRI functional connectivity

Defining the running example in TVB-Optim

Parallelism: jax.vmap and jax.pmap under the hood

Grid search: map the whole landscape

Random search: the surprising upgrade

Quasi-random Sobol: Sensitivity for free

Evolutionary Approaches: adaptive, no gradients

Hitting the wall: why sampling stops scaling

Automatic differentiation: Gradients (almost) for free

Gradient descent: The regime-changer

Bayesian inference — when you want a posterior

Priors encode what you (already) know

HMC vs SVI: same model, two budgets

When gradients break: simulation-based inference

Choosing a method: Cheat Sheet

Chaos eats your gradient

The TVB-Optim workflow

Use-cases

7. Stimulating the brain

Visual evoked potentials

Stimulation with Bayesian inference

End of Day 1

Day 2 — Mini Hackathon and Plot Your Brain Tutorial

Hackathon

Bring your own…

Brain Plotting with bsplot

Wrap-up

Acknowledgements

Parallelism: `jax.vmap` and `jax.pmap` under the hood

Brain Plotting with `bsplot`