Simulations
InterSIM bridge utilities and semi-synthetic trajectory generators for multi-omics simulation studies.
R dependency
The bridge is optional and requires Rscript plus the R InterSIM package:
install.packages(
"InterSIM",
repos = c("https://cran.r-universe.dev", "https://cloud.r-project.org")
)
Check availability before running a simulation:
from motco.simulations import check_intersim_available
availability = check_intersim_available()
if not availability.available:
print(availability.message)
Example
from motco.simulations import InterSIMParams, run_intersim
result = run_intersim(
InterSIMParams(
seed=1203,
n_sample=100,
cluster_sample_prop=(0.3, 0.3, 0.4),
delta_methyl=1.0,
delta_expr=1.0,
delta_protein=1.0,
p_dmp=0.1,
)
)
methylation = result.methylation
expression = result.expression
proteomics = result.proteomics
clusters = result.clusters
Semi-synthetic trajectory generation
The semi-synthetic trajectory generator converts an InterSIMResult into a MOTCO-ready dataset with:
- aligned methylation, gene expression, and proteomics matrices
- sample metadata containing
sample_id, group, stage, and cluster
- truth metadata recording trajectory mode, affected features, stage mapping, and generator seed
The first generator uses an explicit clusters-as-stages assumption: sorted InterSIM cluster labels are mapped to ordered integer stages starting at 0. Original cluster labels are preserved in sample metadata.
from motco.simulations import (
InterSIMParams,
SemiSyntheticTrajectoryParams,
generate_semisynthetic_trajectory_from_intersim,
)
dataset = generate_semisynthetic_trajectory_from_intersim(
InterSIMParams(seed=1203, n_sample=120, cluster_sample_prop=(0.3, 0.3, 0.4)),
SemiSyntheticTrajectoryParams(
seed=99,
trajectory_mode="magnitude",
group_effect_size=0.2,
group_ratio=0.5,
prop_affected_features=0.05,
),
)
sample_metadata = dataset.metadata
truth = dataset.truth
Supported trajectory modes:
| Mode |
Injected group-specific pattern |
none |
No group-specific shift; useful for Type I error scenarios |
translation |
Same affected-feature shift in every stage |
magnitude |
Stage-proportional shift along the affected-feature direction |
orientation |
Stage-proportional off-axis shift |
shape |
Non-monotone stage-specific shift; requires at least three stages |
Evaluation harness
The evaluation harness runs one SemiSyntheticTrajectoryDataset through MOTCO integration and trajectory testing. It is the per-replicate layer used before larger Type I error or power grids.
from motco.simulations import (
SimulationEvaluationParams,
evaluate_semisynthetic_trajectory,
)
evaluation = evaluate_semisynthetic_trajectory(
dataset,
SimulationEvaluationParams(
integration_method="concat",
permutations=0,
),
)
observed_delta = evaluation.pair_statistics["delta"]
truth = evaluation.truth_metadata
Supported integration methods:
| Method |
Behavior |
concat |
Column-binds methylation, expression, and proteomics matrices after deterministic per-feature standardization by default |
snf |
Builds per-omic affinity matrices, fuses them with SNF, and uses spectral embedding as the trajectory outcome matrix |
Set permutations=0 for observed statistics only. When permutations > 0, the harness runs RRPP and computes upper-tail empirical p-values with plus-one correction:
p = (1 + count(null >= observed)) / (1 + n_permutations)
The result includes observed delta, angle, and shape matrices, scalar two-group pair statistics, optional p-values, latent matrix metadata, generator truth metadata, runtime metadata, group/stage levels, and the trajectory contrast. Shape pair statistics and p-values are reported as unavailable for datasets with fewer than three stages.
Grid orchestration
The grid orchestration layer enumerates parameter cells, runs local replicates through the evaluation harness, persists one JSONL row per replicate, resumes completed work, and summarizes rejection rates for Type I error or power studies.
from pathlib import Path
from motco.simulations import (
InterSIMParams,
SemiSyntheticTrajectoryParams,
SimulationEvaluationParams,
SimulationRunConfig,
enumerate_type_i_grid,
run_simulation_grid,
summarize_rejection_rates,
)
grid = enumerate_type_i_grid(
baseline_intersim_params=InterSIMParams(seed=1, n_sample=60),
baseline_generator_params=SemiSyntheticTrajectoryParams(seed=2),
evaluation_params=SimulationEvaluationParams(integration_method="concat", permutations=99),
axes={
"intersim.n_sample": [60, 120],
"generator.group_ratio": [0.5, 0.7],
},
n_replicates=3,
base_seed=2026,
)
records = run_simulation_grid(
grid,
config=SimulationRunConfig(output_path=Path("simulation-results.jsonl")),
)
summaries = summarize_rejection_rates(records, alpha=0.05)
Each SimulationCell stores a stable cell_id, phase, InterSIMParams, SemiSyntheticTrajectoryParams, SimulationEvaluationParams, replicate count, base seed, and metadata such as the varied axis. Axis names use explicit namespaces: intersim.<field>, generator.<field>, or evaluation.<field>.
Initial persistence is JSON Lines. Each row records cell and replicate IDs, deterministic seeds, a parameter signature, status, p-values, pair statistics, truth metadata, runtime metadata, cell metadata, and optional error details. With resume=True, completed rows with matching parameter signatures are skipped. A matching cell/replicate with a different parameter signature raises unless overwrite=True.
summarize_rejection_rates groups completed replicate rows by cell and statistic, then reports available replicate count, rejection count, rejection rate, Monte Carlo standard error, and unavailable replicate count. Missing statistics, such as shape p-values for two-stage datasets, remain unavailable rather than being counted as non-significant.
API
motco.simulations.InterSIMParams
dataclass
Parameters for the InterSIM R package.
Parameters left as None are omitted from the R call, allowing InterSIM
defaults to apply. The covariance parameters currently support InterSIM's
native "indep" string option or None.
Source code in src/motco/simulations/intersim.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71 | @dataclass(frozen=True)
class InterSIMParams:
"""Parameters for the InterSIM R package.
Parameters left as ``None`` are omitted from the R call, allowing InterSIM
defaults to apply. The covariance parameters currently support InterSIM's
native ``"indep"`` string option or ``None``.
"""
seed: int
n_sample: int | None = None
cluster_sample_prop: tuple[float, ...] | None = None
delta_methyl: float | None = None
delta_expr: float | None = None
delta_protein: float | None = None
p_dmp: float | None = None
p_deg: float | None = None
p_dep: float | None = None
sigma_methyl: str | None = None
sigma_expr: str | None = None
sigma_protein: str | None = None
cor_methyl_expr: float | None = None
cor_expr_protein: float | None = None
|
motco.simulations.InterSIMResult
dataclass
Normalized InterSIM simulation output.
Source code in src/motco/simulations/intersim.py
74
75
76
77
78
79
80
81
82
83 | @dataclass(frozen=True)
class InterSIMResult:
"""Normalized InterSIM simulation output."""
methylation: pd.DataFrame
expression: pd.DataFrame
proteomics: pd.DataFrame
sample_ids: pd.Index
clusters: pd.Series
metadata: dict[str, Any] = field(default_factory=dict)
|
motco.simulations.InterSIMAvailability
dataclass
Availability result for the external R InterSIM dependency.
Source code in src/motco/simulations/intersim.py
| @dataclass(frozen=True)
class InterSIMAvailability:
"""Availability result for the external R InterSIM dependency."""
available: bool
message: str
rscript_path: str | None = None
|
motco.simulations.SemiSyntheticTrajectoryParams
dataclass
Parameters for semi-synthetic trajectory generation.
group_ratio is the target proportion assigned to the first label in
group_labels within every generated stage.
Source code in src/motco/simulations/semisynthetic.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37 | @dataclass(frozen=True)
class SemiSyntheticTrajectoryParams:
"""Parameters for semi-synthetic trajectory generation.
``group_ratio`` is the target proportion assigned to the first label in
``group_labels`` within every generated stage.
"""
seed: int
trajectory_mode: TrajectoryMode = "none"
group_effect_size: float = 0.0
group_ratio: float = 0.5
group_labels: tuple[str, str] = ("A", "B")
prop_affected_features: float | Mapping[OmicsLayer, float] = 0.1
affected_features: Mapping[OmicsLayer, Sequence[str]] | None = None
|
motco.simulations.SemiSyntheticTrajectoryDataset
dataclass
MOTCO-ready semi-synthetic trajectory dataset.
Source code in src/motco/simulations/semisynthetic.py
40
41
42
43
44
45
46
47
48 | @dataclass(frozen=True)
class SemiSyntheticTrajectoryDataset:
"""MOTCO-ready semi-synthetic trajectory dataset."""
methylation: pd.DataFrame
expression: pd.DataFrame
proteomics: pd.DataFrame
metadata: pd.DataFrame
truth: dict[str, Any] = field(default_factory=dict)
|
motco.simulations.SimulationEvaluationParams
dataclass
Parameters for evaluating one semi-synthetic trajectory dataset.
Source code in src/motco/simulations/evaluation.py
27
28
29
30
31
32
33
34
35
36
37
38
39 | @dataclass(frozen=True)
class SimulationEvaluationParams:
"""Parameters for evaluating one semi-synthetic trajectory dataset."""
integration_method: IntegrationMethod = "concat"
integration_params: Mapping[str, Any] = field(default_factory=dict)
permutations: int = 0
n_jobs: int | None = 1
seed: int | None = None
progress: bool = False
include_null_distributions: bool = False
group_col: str = "group"
stage_col: str = "stage"
|
motco.simulations.SimulationEvaluationResult
dataclass
Result from evaluating one semi-synthetic trajectory dataset.
Source code in src/motco/simulations/evaluation.py
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78 | @dataclass(frozen=True)
class SimulationEvaluationResult:
"""Result from evaluating one semi-synthetic trajectory dataset."""
observed_deltas: np.ndarray
observed_angles: np.ndarray
observed_shapes: np.ndarray
pair_statistics: dict[str, float]
p_values: dict[str, float]
latent_matrix_metadata: dict[str, Any]
truth_metadata: dict[str, Any]
runtime_metadata: dict[str, Any]
evaluation_params: SimulationEvaluationParams
group_levels: list[str]
stage_levels: list[str]
contrast: list[list[int]]
null_distributions: dict[str, list[float]] | None = None
|
motco.simulations.SimulationTrajectoryDesign
dataclass
Trajectory design objects derived from sample metadata.
Source code in src/motco/simulations/evaluation.py
50
51
52
53
54
55
56
57
58
59 | @dataclass(frozen=True)
class SimulationTrajectoryDesign:
"""Trajectory design objects derived from sample metadata."""
model_full: np.ndarray
model_reduced: np.ndarray
ls_means: np.ndarray
contrast: list[list[int]]
group_levels: list[str]
stage_levels: list[str]
|
motco.simulations.LatentIntegrationResult
dataclass
Integrated latent/outcome matrix and metadata.
Source code in src/motco/simulations/evaluation.py
| @dataclass(frozen=True)
class LatentIntegrationResult:
"""Integrated latent/outcome matrix and metadata."""
matrix: pd.DataFrame
metadata: dict[str, Any]
|
motco.simulations.SimulationCell
dataclass
One simulation parameter cell with one or more replicates.
Source code in src/motco/simulations/grid.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56 | @dataclass(frozen=True)
class SimulationCell:
"""One simulation parameter cell with one or more replicates."""
cell_id: str
phase: str
intersim_params: InterSIMParams
generator_params: SemiSyntheticTrajectoryParams
evaluation_params: SimulationEvaluationParams
n_replicates: int = 1
base_seed: int = 0
metadata: Mapping[str, Any] = field(default_factory=dict)
def __post_init__(self) -> None:
if self.n_replicates < 1:
raise SimulationGridError("n_replicates must be at least 1.")
if not self.cell_id:
raise SimulationGridError("cell_id must be a non-empty string.")
if not self.phase:
raise SimulationGridError("phase must be a non-empty string.")
|
motco.simulations.SimulationGrid
dataclass
Collection of simulation cells.
Source code in src/motco/simulations/grid.py
59
60
61
62
63
64
65
66
67
68
69
70 | @dataclass(frozen=True)
class SimulationGrid:
"""Collection of simulation cells."""
cells: tuple[SimulationCell, ...]
metadata: Mapping[str, Any] = field(default_factory=dict)
def __post_init__(self) -> None:
ids = [cell.cell_id for cell in self.cells]
duplicates = sorted({cell_id for cell_id in ids if ids.count(cell_id) > 1})
if duplicates:
raise SimulationGridError(f"cell_id values must be unique; duplicates: {duplicates}.")
|
motco.simulations.SimulationReplicateResult
dataclass
One persisted row for a simulation cell replicate.
Source code in src/motco/simulations/grid.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92 | @dataclass(frozen=True)
class SimulationReplicateResult:
"""One persisted row for a simulation cell replicate."""
cell_id: str
phase: str
replicate_index: int
replicate_seed: int
intersim_seed: int
generator_seed: int
evaluation_seed: int | None
parameter_signature: str
status: Literal["completed", "failed"]
p_values: dict[str, float | None] = field(default_factory=dict)
pair_statistics: dict[str, float | None] = field(default_factory=dict)
truth_metadata: dict[str, Any] = field(default_factory=dict)
runtime_metadata: dict[str, Any] = field(default_factory=dict)
cell_metadata: dict[str, Any] = field(default_factory=dict)
error_type: str | None = None
error_message: str | None = None
|
motco.simulations.SimulationRunConfig
dataclass
Runtime options for local grid execution.
Source code in src/motco/simulations/grid.py
111
112
113
114
115
116
117
118
119 | @dataclass(frozen=True)
class SimulationRunConfig:
"""Runtime options for local grid execution."""
output_path: Path | None = None
persistence_format: PersistenceFormat = "jsonl"
resume: bool = True
overwrite: bool = False
error_policy: ErrorPolicy = "raise"
|
motco.simulations.SimulationSummaryResult
dataclass
Rejection-rate summary for one cell and statistic.
Source code in src/motco/simulations/grid.py
95
96
97
98
99
100
101
102
103
104
105
106
107
108 | @dataclass(frozen=True)
class SimulationSummaryResult:
"""Rejection-rate summary for one cell and statistic."""
cell_id: str
phase: str
statistic: str
alpha: float
completed_replicates: int
available_replicates: int
rejected_replicates: int
rejection_rate: float | None
monte_carlo_se: float | None
unavailable_replicates: int
|
motco.simulations.check_intersim_available(rscript='Rscript')
Check whether Rscript and the R InterSIM package are available.
Source code in src/motco/simulations/intersim.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131 | def check_intersim_available(rscript: str = "Rscript") -> InterSIMAvailability:
"""Check whether Rscript and the R InterSIM package are available."""
rscript_path = which(rscript)
if rscript_path is None:
return InterSIMAvailability(
available=False,
message=f"Rscript dependency is missing or not on PATH: {rscript}",
)
cmd = [
rscript_path,
"-e",
'quit(status = if (requireNamespace("InterSIM", quietly = TRUE)) 0 else 1)',
]
proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
if proc.returncode != 0:
details = _format_process_output(proc)
return InterSIMAvailability(
available=False,
message="R package InterSIM is not installed or cannot be loaded." + details,
rscript_path=rscript_path,
)
return InterSIMAvailability(
available=True,
message="Rscript and R package InterSIM are available.",
rscript_path=rscript_path,
)
|
motco.simulations.run_intersim(params, *, rscript='Rscript', check_dependency=True)
Invoke R InterSIM and return aligned omics matrices plus cluster metadata.
Source code in src/motco/simulations/intersim.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163 | def run_intersim(
params: InterSIMParams,
*,
rscript: str = "Rscript",
check_dependency: bool = True,
) -> InterSIMResult:
"""Invoke R InterSIM and return aligned omics matrices plus cluster metadata."""
if check_dependency:
availability = check_intersim_available(rscript)
if not availability.available:
raise InterSIMDependencyError(availability.message)
rscript_cmd = availability.rscript_path or rscript
else:
rscript_cmd = which(rscript) or rscript
with tempfile.TemporaryDirectory(prefix="motco-intersim-") as tmp:
output_dir = Path(tmp)
proc = subprocess.run(
_build_rscript_command(params, output_dir=output_dir, rscript=rscript_cmd),
capture_output=True,
text=True,
check=False,
)
if proc.returncode != 0:
raise InterSIMRuntimeError(
f"InterSIM R process failed with exit code {proc.returncode}."
f"{_format_process_output(proc)}"
)
return _load_result(output_dir, params=params)
|
motco.simulations.generate_semisynthetic_trajectory(intersim_result, params)
Generate a semi-synthetic trajectory dataset from an InterSIM result.
Source code in src/motco/simulations/semisynthetic.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119 | def generate_semisynthetic_trajectory(
intersim_result: InterSIMResult,
params: SemiSyntheticTrajectoryParams,
) -> SemiSyntheticTrajectoryDataset:
"""Generate a semi-synthetic trajectory dataset from an InterSIM result."""
_validate_params(params)
omics: dict[OmicsLayer, pd.DataFrame] = {
"methylation": intersim_result.methylation.copy(deep=True),
"expression": intersim_result.expression.copy(deep=True),
"proteomics": intersim_result.proteomics.copy(deep=True),
}
_validate_intersim_alignment(intersim_result, omics)
stage_mapping = _build_stage_mapping(intersim_result.clusters)
metadata = _build_sample_metadata(intersim_result, stage_mapping)
n_stages = int(metadata["stage"].nunique())
if params.trajectory_mode == "shape" and n_stages < 3:
raise SemiSyntheticTrajectoryError("trajectory_mode='shape' requires at least three stages.")
rng = np.random.default_rng(params.seed)
metadata["group"] = _assign_groups_within_stages(metadata["stage"], params, rng)
affected_features = _select_affected_features(omics, params, rng)
effect_coefficients = _stage_effect_coefficients(params.trajectory_mode, n_stages)
truth: dict[str, Any] = {
"trajectory_mode": params.trajectory_mode,
"group_effect_size": params.group_effect_size,
"group_labels": list(params.group_labels),
"group_ratio": params.group_ratio,
"seed": params.seed,
"stage_mapping": {str(k): v for k, v in stage_mapping.items()},
"stage_assumption": "clusters-as-stages",
"affected_features": affected_features,
"effect_coefficients": effect_coefficients.tolist(),
"effect_vectors": {},
"intersim_metadata": intersim_result.metadata,
}
if params.trajectory_mode != "none" and params.group_effect_size != 0:
for layer in _OMICS_LAYERS:
effect_vector = _effect_vector_for_layer(
layer=layer,
columns=omics[layer].columns,
affected_features=affected_features[layer],
mode=params.trajectory_mode,
)
_apply_group_effect(
matrix=omics[layer],
metadata=metadata,
params=params,
effect_coefficients=effect_coefficients,
effect_vector=effect_vector,
)
truth["effect_vectors"][layer] = {
feature: float(effect_vector.loc[feature])
for feature in affected_features[layer]
}
else:
truth["effect_vectors"] = {layer: {} for layer in _OMICS_LAYERS}
_validate_dataset_alignment(omics, metadata)
return SemiSyntheticTrajectoryDataset(
methylation=omics["methylation"],
expression=omics["expression"],
proteomics=omics["proteomics"],
metadata=metadata,
truth=truth,
)
|
motco.simulations.generate_semisynthetic_trajectory_from_intersim(intersim_params, trajectory_params, *, rscript='Rscript', check_dependency=True)
Invoke InterSIM and generate a semi-synthetic trajectory dataset.
Source code in src/motco/simulations/semisynthetic.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136 | def generate_semisynthetic_trajectory_from_intersim(
intersim_params: InterSIMParams,
trajectory_params: SemiSyntheticTrajectoryParams,
*,
rscript: str = "Rscript",
check_dependency: bool = True,
) -> SemiSyntheticTrajectoryDataset:
"""Invoke InterSIM and generate a semi-synthetic trajectory dataset."""
intersim_result = run_intersim(
intersim_params,
rscript=rscript,
check_dependency=check_dependency,
)
return generate_semisynthetic_trajectory(intersim_result, trajectory_params)
|
motco.simulations.integrate_semisynthetic_dataset(dataset, params)
Create a latent/outcome matrix from aligned omics layers.
Source code in src/motco/simulations/evaluation.py
167
168
169
170
171
172
173
174
175
176
177
178
179 | def integrate_semisynthetic_dataset(
dataset: SemiSyntheticTrajectoryDataset,
params: SimulationEvaluationParams,
) -> LatentIntegrationResult:
"""Create a latent/outcome matrix from aligned omics layers."""
_validate_dataset(dataset, params.group_col, params.stage_col)
method = params.integration_method
if method == "concat":
return _concat_integration(dataset, params.integration_params)
if method == "snf":
return _snf_integration(dataset, params.integration_params)
raise SimulationEvaluationError(f"Unsupported integration_method: {method!r}.")
|
motco.simulations.build_simulation_trajectory_design(metadata, *, group_col='group', stage_col='stage')
Build full/reduced model matrices, LS means, and two-group contrast.
Source code in src/motco/simulations/evaluation.py
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216 | def build_simulation_trajectory_design(
metadata: pd.DataFrame,
*,
group_col: str = "group",
stage_col: str = "stage",
) -> SimulationTrajectoryDesign:
"""Build full/reduced model matrices, LS means, and two-group contrast."""
_validate_metadata(metadata, group_col, stage_col)
design_frame = metadata[[group_col, stage_col]].reset_index(drop=True).copy()
design_frame[group_col] = design_frame[group_col].astype(str)
design_frame[stage_col] = design_frame[stage_col].astype(str)
group_levels = sorted(pd.unique(design_frame[group_col]).tolist())
stage_levels = sorted(pd.unique(design_frame[stage_col]).tolist())
if len(group_levels) != 2:
raise SimulationEvaluationError(f"Expected exactly two groups, found {len(group_levels)}: {group_levels}.")
if len(stage_levels) < 2:
raise SimulationEvaluationError(
f"Expected at least two stages, found {len(stage_levels)}: {stage_levels}."
)
_validate_group_stage_combinations(design_frame, group_col, stage_col, group_levels, stage_levels)
model_full = get_model_matrix(design_frame, group_col=group_col, level_col=stage_col, full=True)
model_reduced = get_model_matrix(design_frame, group_col=group_col, level_col=stage_col, full=False)
ls_means = build_ls_means(group_levels, stage_levels, full=True)
n_stages = len(stage_levels)
contrast = [[group_index * n_stages + stage_index for stage_index in range(n_stages)] for group_index in range(2)]
return SimulationTrajectoryDesign(
model_full=model_full,
model_reduced=model_reduced,
ls_means=ls_means,
contrast=contrast,
group_levels=group_levels,
stage_levels=stage_levels,
)
|
motco.simulations.evaluate_semisynthetic_trajectory(dataset, params=None)
Evaluate one semi-synthetic trajectory dataset through MOTCO routines.
Source code in src/motco/simulations/evaluation.py
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164 | def evaluate_semisynthetic_trajectory(
dataset: SemiSyntheticTrajectoryDataset,
params: SimulationEvaluationParams | None = None,
) -> SimulationEvaluationResult:
"""Evaluate one semi-synthetic trajectory dataset through MOTCO routines."""
params = params or SimulationEvaluationParams()
_validate_evaluation_params(params)
start = perf_counter()
latent = integrate_semisynthetic_dataset(dataset, params)
design = build_simulation_trajectory_design(
dataset.metadata,
group_col=params.group_col,
stage_col=params.stage_col,
)
observed_deltas, observed_angles, observed_shapes = estimate_difference(
Y=latent.matrix,
model_matrix=design.model_full,
LS_means=design.ls_means,
contrast=design.contrast,
)
shape_available = len(design.stage_levels) >= 3
pair_statistics = _extract_pair_statistics(
observed_deltas,
observed_angles,
observed_shapes,
shape_available=shape_available,
)
p_values: dict[str, float] = {}
null_distributions: dict[str, list[float]] | None = None
if params.permutations > 0:
dist_delta, dist_angle, dist_shape = RRPP(
latent.matrix,
design.model_full,
design.model_reduced,
design.ls_means,
design.contrast,
permutations=params.permutations,
n_jobs=params.n_jobs,
progress=params.progress,
seed=params.seed,
)
null_values = _extract_null_distributions(
dist_delta,
dist_angle,
dist_shape,
shape_available=shape_available,
)
p_values = {
statistic: _empirical_p_value(null_values[statistic], observed)
for statistic, observed in pair_statistics.items()
if statistic in null_values and np.isfinite(observed)
}
if params.include_null_distributions:
null_distributions = null_values
runtime_seconds = perf_counter() - start
runtime_metadata = {
"runtime_seconds": runtime_seconds,
"permutations": params.permutations,
"n_jobs": params.n_jobs,
"seed": params.seed,
"progress": params.progress,
"shape_available": shape_available,
}
return SimulationEvaluationResult(
observed_deltas=observed_deltas,
observed_angles=observed_angles,
observed_shapes=observed_shapes,
pair_statistics=pair_statistics,
p_values=p_values,
latent_matrix_metadata=latent.metadata,
truth_metadata=dict(dataset.truth),
runtime_metadata=runtime_metadata,
evaluation_params=params,
group_levels=design.group_levels,
stage_levels=design.stage_levels,
contrast=design.contrast,
null_distributions=null_distributions,
)
|
motco.simulations.enumerate_type_i_grid(*, baseline_intersim_params, baseline_generator_params, evaluation_params=None, axes=None, n_replicates=1, base_seed=0)
Enumerate a null Type I grid with baseline plus one-factor-at-a-time axes.
Source code in src/motco/simulations/grid.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210 | def enumerate_type_i_grid(
*,
baseline_intersim_params: InterSIMParams,
baseline_generator_params: SemiSyntheticTrajectoryParams,
evaluation_params: SimulationEvaluationParams | None = None,
axes: Mapping[str, Sequence[Any]] | None = None,
n_replicates: int = 1,
base_seed: int = 0,
) -> SimulationGrid:
"""Enumerate a null Type I grid with baseline plus one-factor-at-a-time axes."""
null_generator = replace(baseline_generator_params, trajectory_mode="none", group_effect_size=0.0)
cells = [
make_simulation_cell(
phase="type_i_baseline",
intersim_params=baseline_intersim_params,
generator_params=null_generator,
evaluation_params=evaluation_params,
n_replicates=n_replicates,
base_seed=base_seed,
metadata={"varied_axis": None, "varied_value": None},
)
]
for axis, values in (axes or {}).items():
baseline_value = _get_axis_value(baseline_intersim_params, null_generator, evaluation_params, axis)
for value in values:
if _to_jsonable(value) == _to_jsonable(baseline_value):
continue
intersim_params, generator_params, eval_params = _apply_axis_value(
baseline_intersim_params,
null_generator,
evaluation_params or SimulationEvaluationParams(),
axis,
value,
)
cells.append(
make_simulation_cell(
phase="type_i_ofat",
intersim_params=intersim_params,
generator_params=generator_params,
evaluation_params=eval_params,
n_replicates=n_replicates,
base_seed=base_seed,
metadata={"varied_axis": axis, "varied_value": value},
)
)
return SimulationGrid(cells=tuple(cells), metadata={"grid_type": "type_i"})
|
motco.simulations.enumerate_power_grid(*, baseline_intersim_params, baseline_generator_params, evaluation_params=None, trajectory_modes, effect_sizes, axes=None, n_replicates=1, base_seed=0)
Enumerate power cells from trajectory modes, effect sizes, and optional axes.
Source code in src/motco/simulations/grid.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277 | def enumerate_power_grid(
*,
baseline_intersim_params: InterSIMParams,
baseline_generator_params: SemiSyntheticTrajectoryParams,
evaluation_params: SimulationEvaluationParams | None = None,
trajectory_modes: Sequence[str],
effect_sizes: Sequence[float],
axes: Mapping[str, Sequence[Any]] | None = None,
n_replicates: int = 1,
base_seed: int = 0,
) -> SimulationGrid:
"""Enumerate power cells from trajectory modes, effect sizes, and optional axes."""
if not trajectory_modes:
raise SimulationGridError("trajectory_modes must contain at least one mode.")
if not effect_sizes:
raise SimulationGridError("effect_sizes must contain at least one value.")
cells: list[SimulationCell] = []
eval_params = evaluation_params or SimulationEvaluationParams()
for mode, effect_size in itertools.product(trajectory_modes, effect_sizes):
generator = replace(
baseline_generator_params,
trajectory_mode=mode, # type: ignore[arg-type]
group_effect_size=float(effect_size),
)
cells.append(
make_simulation_cell(
phase="power_primary",
intersim_params=baseline_intersim_params,
generator_params=generator,
evaluation_params=eval_params,
n_replicates=n_replicates,
base_seed=base_seed,
metadata={"trajectory_mode": mode, "effect_size": float(effect_size), "varied_axis": None},
)
)
for axis, values in (axes or {}).items():
baseline_value = _get_axis_value(baseline_intersim_params, generator, eval_params, axis)
for value in values:
if _to_jsonable(value) == _to_jsonable(baseline_value):
continue
intersim_params, generator_params, axis_eval_params = _apply_axis_value(
baseline_intersim_params,
generator,
eval_params,
axis,
value,
)
cells.append(
make_simulation_cell(
phase="power_ofat",
intersim_params=intersim_params,
generator_params=generator_params,
evaluation_params=axis_eval_params,
n_replicates=n_replicates,
base_seed=base_seed,
metadata={
"trajectory_mode": mode,
"effect_size": float(effect_size),
"varied_axis": axis,
"varied_value": value,
},
)
)
return SimulationGrid(cells=tuple(cells), metadata={"grid_type": "power"})
|
motco.simulations.run_simulation_replicate(cell, replicate_index, *, evaluator=None, error_policy='raise')
Run one replicate, using a fake evaluator in tests or the default harness in production.
Source code in src/motco/simulations/grid.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359 | def run_simulation_replicate(
cell: SimulationCell,
replicate_index: int,
*,
evaluator: Evaluator | None = None,
error_policy: ErrorPolicy = "raise",
) -> SimulationReplicateResult:
"""Run one replicate, using a fake evaluator in tests or the default harness in production."""
_validate_error_policy(error_policy)
replicate_seed = derive_replicate_seed(cell, replicate_index)
intersim_params = replace(cell.intersim_params, seed=replicate_seed)
generator_params = replace(cell.generator_params, seed=replicate_seed)
evaluation_seed = cell.evaluation_params.seed if cell.evaluation_params.seed is not None else replicate_seed
evaluation_params = replace(cell.evaluation_params, seed=evaluation_seed)
evaluator = evaluator or _default_evaluator
start = perf_counter()
try:
result = evaluator(intersim_params, generator_params, evaluation_params)
except Exception as exc:
if error_policy == "raise":
raise
return SimulationReplicateResult(
cell_id=cell.cell_id,
phase=cell.phase,
replicate_index=replicate_index,
replicate_seed=replicate_seed,
intersim_seed=intersim_params.seed,
generator_seed=generator_params.seed,
evaluation_seed=evaluation_params.seed,
parameter_signature=parameter_signature(cell),
status="failed",
runtime_metadata={"runtime_seconds": perf_counter() - start},
cell_metadata=dict(cell.metadata),
error_type=type(exc).__name__,
error_message=str(exc),
)
return SimulationReplicateResult(
cell_id=cell.cell_id,
phase=cell.phase,
replicate_index=replicate_index,
replicate_seed=replicate_seed,
intersim_seed=intersim_params.seed,
generator_seed=generator_params.seed,
evaluation_seed=evaluation_params.seed,
parameter_signature=parameter_signature(cell),
status="completed",
p_values=_finite_mapping(result.p_values),
pair_statistics=_finite_mapping(result.pair_statistics),
truth_metadata=result.truth_metadata,
runtime_metadata=result.runtime_metadata,
cell_metadata=dict(cell.metadata),
)
|
motco.simulations.run_simulation_grid(grid, *, config=None, evaluator=None)
Run a grid locally with optional JSONL persistence and resume support.
Source code in src/motco/simulations/grid.py
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402 | def run_simulation_grid(
grid: SimulationGrid,
*,
config: SimulationRunConfig | None = None,
evaluator: Evaluator | None = None,
) -> list[SimulationReplicateResult]:
"""Run a grid locally with optional JSONL persistence and resume support."""
config = config or SimulationRunConfig()
if config.persistence_format != "jsonl":
raise SimulationGridError("Only JSONL persistence is currently supported.")
completed = (
_completed_index(config.output_path)
if config.output_path and config.resume and not config.overwrite
else {}
)
records: list[SimulationReplicateResult] = []
for cell in grid.cells:
signature = parameter_signature(cell)
for replicate_index in range(cell.n_replicates):
key = (cell.cell_id, replicate_index)
existing_signature = completed.get(key)
if existing_signature is not None:
if existing_signature != signature:
if not config.overwrite:
raise SimulationGridError(
f"Existing result for {cell.cell_id} replicate {replicate_index} has a different "
"parameter signature. Set overwrite=True to rerun."
)
else:
continue
record = run_simulation_replicate(
cell,
replicate_index,
evaluator=evaluator,
error_policy=config.error_policy,
)
records.append(record)
if config.output_path is not None:
append_replicate_results(config.output_path, [record])
return records
|
motco.simulations.read_replicate_results(path)
Read replicate records from a JSONL file.
Source code in src/motco/simulations/grid.py
414
415
416
417
418
419
420
421
422
423
424
425 | def read_replicate_results(path: Path) -> list[SimulationReplicateResult]:
"""Read replicate records from a JSONL file."""
if not path.exists():
return []
records = []
with path.open("r", encoding="utf-8") as handle:
for line in handle:
if not line.strip():
continue
records.append(_replicate_result_from_dict(json.loads(line)))
return records
|
motco.simulations.append_replicate_results(path, records)
Append replicate records to a JSONL file.
Source code in src/motco/simulations/grid.py
405
406
407
408
409
410
411 | def append_replicate_results(path: Path, records: Iterable[SimulationReplicateResult]) -> None:
"""Append replicate records to a JSONL file."""
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("a", encoding="utf-8") as handle:
for record in records:
handle.write(json.dumps(_to_jsonable(record), sort_keys=True, allow_nan=False) + "\n")
|
motco.simulations.summarize_rejection_rates(records, *, alpha=0.05, statistics=('delta', 'angle', 'shape'))
Summarize p-value rejection rates by cell and statistic.
Source code in src/motco/simulations/grid.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466 | def summarize_rejection_rates(
records: Sequence[SimulationReplicateResult],
*,
alpha: float = 0.05,
statistics: Sequence[str] = ("delta", "angle", "shape"),
) -> list[SimulationSummaryResult]:
"""Summarize p-value rejection rates by cell and statistic."""
if not (0 < alpha < 1):
raise SimulationGridError("alpha must be between 0 and 1.")
groups: dict[tuple[str, str], list[SimulationReplicateResult]] = {}
for record in records:
if record.status != "completed":
continue
groups.setdefault((record.cell_id, record.phase), []).append(record)
summaries: list[SimulationSummaryResult] = []
for (cell_id, phase), group_records in sorted(groups.items()):
for statistic in statistics:
p_values = [record.p_values.get(statistic) for record in group_records]
available = [float(p) for p in p_values if p is not None and math.isfinite(float(p))]
rejected = sum(p < alpha for p in available)
rate = rejected / len(available) if available else None
se = math.sqrt(rate * (1.0 - rate) / len(available)) if rate is not None else None
summaries.append(
SimulationSummaryResult(
cell_id=cell_id,
phase=phase,
statistic=statistic,
alpha=alpha,
completed_replicates=len(group_records),
available_replicates=len(available),
rejected_replicates=rejected,
rejection_rate=rate,
monte_carlo_se=se,
unavailable_replicates=len(group_records) - len(available),
)
)
return summaries
|