Tutorial 5 — Cluster / MPI Deployment#
This tutorial shows how to run ForMoSA on an HPC cluster using MPI parallelism with PyMultiNest. Two patterns are covered:
Pattern A — nohup (single node, no job scheduler)
Pattern B — SLURM (multi-node, job scheduler)
Both use the same Python run script.
When do you need this?#
The nestle backend runs in a single thread. It is fine for notebooks and
quick checks (≤ 3 free parameters, ≤ 100 live points). For production runs:
Situation |
Recommendation |
|---|---|
≤ 3 free parameters |
|
4–6 free parameters, local machine |
|
4+ free parameters, cluster available |
|
> 6 free parameters |
|
Prerequisites#
Software#
# Install PyMultiNest (see Installation page for full instructions)
pip install mpi4py
# MultiNest must be compiled separately — see docs/installation.rst
# Verify
python -c "import pymultinest; print('PyMultiNest OK')"
python -c "from mpi4py import MPI; print(f'mpi4py OK — {MPI.Get_library_version()}')"
On the cluster: load required modules#
Exact module names vary by cluster. Typical pattern:
module load openmpi/4.1.5 # or whatever version is available
module load gcc/12.2.0
module load anaconda/2023.09 # or miniconda
conda activate env_formosa
Check available modules with module avail openmpi and module avail gcc.
The Python run script#
Save this as run_formosa.py in your analysis directory. It auto-detects MPI
and runs serially if MPI is not available.
"""
ForMoSA v2.0 — MPI-aware run script.
Usage:
Single-core:
python run_formosa.py
Parallel (PyMultiNest):
mpirun -np 12 python run_formosa.py
"""
import os
import sys
import time
import warnings
from pathlib import Path
warnings.filterwarnings("ignore", category=FutureWarning)
# ── MPI detection ──────────────────────────────────────────────────────────
# Imports must happen BEFORE any ForMoSA import to avoid fork-safety issues.
# MPI initialises the process model; importing multiprocessing-based libraries
# before MPI init can cause deadlocks on some systems.
try:
from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()
HAS_MPI = SIZE > 1
except ImportError:
RANK, SIZE, HAS_MPI = 0, 1, False
COMM = None
IS_ROOT = (RANK == 0)
# ═══════════════════════════════════════════════════════════════════
# USER CONFIGURATION — edit this block
# ═══════════════════════════════════════════════════════════════════
WORK_PATH = Path("/path/to/your/analysis/directory/")
CONFIG_FILE = "config.ini" # relative to WORK_PATH
FILTER_PATH = Path("~/filters").expanduser() # SVO filter cache
ADAPT = False # True: run adaptation (rank 0 only)
NPOINTS = 500 # live points — override config.ini value
NS_ALGO = "pymultinest"
# ═══════════════════════════════════════════════════════════════════
def log(msg: str) -> None:
"""Print only from rank 0 to avoid garbled output."""
if IS_ROOT:
prefix = f"[rank 0/{SIZE}]" if HAS_MPI else ""
print(f"{prefix} {msg}", flush=True)
def main() -> None:
t0 = time.time()
from ForMoSA import Analysis
from ForMoSA.config.global_config import ConfigLoader, Config_NS
from ForMoSA.core.config import set_filter_path
set_filter_path(FILTER_PATH)
cfg = ConfigLoader(str(WORK_PATH / CONFIG_FILE))
sections = cfg.load()
cfg.config["config_inversion"].ns_algo = NS_ALGO
cfg.config["config_inversion"].npoints = NPOINTS
config_ns = Config_NS(
nestle=cfg.config["config_nestle"],
pymultinest=cfg.config["config_pymultinest"],
ultranest=cfg.config["config_ultranest"],
)
# ── Step 1: Adaptation (rank 0 only) ──────────────────────────────────
# Adaptation already uses an internal ThreadPool — MPI ranks do not help here.
if IS_ROOT and ADAPT:
log("STEP 1 — Grid adaptation")
t1 = time.time()
analysis = Analysis(cfg.config["config_path"], adapted=False, fitted=False)
analysis.adapt(cfg.config["config_adapt"], cfg.config["config_inversion"])
log(f"Adaptation done in {time.time()-t1:.1f}s")
# Wait for rank 0 to finish adaptation before all ranks enter NS.
if HAS_MPI:
COMM.Barrier()
# ── Step 2: Nested sampling (all ranks participate) ────────────────────
# PyMultiNest handles inter-rank communication internally via MPI.
# Every rank creates its own Analysis object pointing to the adapted grid.
log("STEP 2 — Nested sampling")
t2 = time.time()
analysis = Analysis(cfg.config["config_path"], adapted=True, fitted=False)
analysis.nested_sampling(
cfg.config["config_parameters"],
cfg.config["config_adapt"],
cfg.config["config_inversion"],
config_NS=config_ns,
)
if IS_ROOT:
log(f"Nested sampling done in {time.time()-t2:.1f}s")
if HAS_MPI:
COMM.Barrier()
# ── Step 3: Plotting (rank 0 only) ─────────────────────────────────────
if IS_ROOT:
log("STEP 3 — Plotting")
import matplotlib
matplotlib.use("Agg") # non-interactive backend for batch runs
analysis = Analysis(cfg.config["config_path"], adapted=True, fitted=True)
analysis.plot(analysis.ns.results, plot_native_model=False)
log(f"Total wall time: {(time.time()-t0)/60:.1f} min")
if __name__ == "__main__":
main()
Pattern A — nohup (single node)#
Use nohup to run in the background and keep the process alive after you
disconnect from the node.
# 1. Load modules (adjust to your cluster)
module load openmpi/4.1.5 gcc/12.2.0
conda activate env_formosa
# 2. Navigate to your analysis directory
cd /path/to/your/analysis/
# 3. Run with MPI — safe ceiling is ~85% of available cores
# (e.g., 12 processes on a 14-core node)
nohup mpirun -np 12 python run_formosa.py > run.log 2>&1 &
# 4. Save the process ID so you can kill it if needed
echo "PID: $!"
# 5. Monitor progress in real time
tail -f run.log
# 6. To stop the run
kill <PID>
Note
nohup keeps the process alive even if your SSH session drops. Pair it with
screen or tmux if you want an interactive terminal that survives reconnects.
Pattern B — SLURM (multi-node)#
Use SLURM when you need more than one node or when the cluster requires job scheduling.
Save this as job.sh in your analysis directory:
#!/bin/bash
#SBATCH --job-name=formosa
#SBATCH --nodes=2 # number of nodes
#SBATCH --ntasks-per-node=16 # MPI ranks per node
#SBATCH --time=04:00:00 # wall-clock limit (HH:MM:SS)
#SBATCH --partition=compute # partition / queue name
#SBATCH --output=formosa_%j.log # %j = SLURM job ID
# ── Load environment ────────────────────────────────────────────────────────
module load openmpi/4.1.5 gcc/12.2.0
conda activate env_formosa
# ── Run ─────────────────────────────────────────────────────────────────────
# $SLURM_NTASKS = nodes × ntasks-per-node = total MPI rank count
cd /path/to/your/analysis/
mpirun -np $SLURM_NTASKS python run_formosa.py
Submit and monitor:
# Submit
sbatch job.sh
# Check status
squeue -u $USER
# Monitor output
tail -f formosa_<JOBID>.log
# Cancel
scancel <JOBID>
Expected speedup#
PyMultiNest scales well up to ~32 MPI ranks for typical ForMoSA problems (4–6 free parameters, 300–500 live points). Beyond that, communication overhead between ranks starts to dominate and speedup flattens.
Ranks |
Approx. speedup vs. single-core |
|---|---|
4 |
~3× |
8 |
~5× |
16 |
~8× |
32 |
~12× |
Rule of thumb for wall time: t_parallel ≈ t_serial / (0.4 × n_ranks).