Skip to content

Simulations

InterSIM bridge utilities and semi-synthetic trajectory generators for multi-omics simulation studies.

R dependency

The bridge is optional and requires Rscript plus the R InterSIM package:

install.packages(
  "InterSIM",
  repos = c("https://cran.r-universe.dev", "https://cloud.r-project.org")
)

Check availability before running a simulation:

from motco.simulations import check_intersim_available

availability = check_intersim_available()
if not availability.available:
    print(availability.message)

Example

from motco.simulations import InterSIMParams, run_intersim

result = run_intersim(
    InterSIMParams(
        seed=1203,
        n_sample=100,
        cluster_sample_prop=(0.3, 0.3, 0.4),
        delta_methyl=1.0,
        delta_expr=1.0,
        delta_protein=1.0,
        p_dmp=0.1,
    )
)

methylation = result.methylation
expression = result.expression
proteomics = result.proteomics
clusters = result.clusters

Semi-synthetic trajectory generation

The semi-synthetic trajectory generator converts an InterSIMResult into a MOTCO-ready dataset with:

  • aligned methylation, gene expression, and proteomics matrices
  • sample metadata containing sample_id, group, stage, and cluster
  • truth metadata recording trajectory mode, affected features, stage mapping, and generator seed

The first generator uses an explicit clusters-as-stages assumption: sorted InterSIM cluster labels are mapped to ordered integer stages starting at 0. Original cluster labels are preserved in sample metadata.

from motco.simulations import (
    InterSIMParams,
    SemiSyntheticTrajectoryParams,
    generate_semisynthetic_trajectory_from_intersim,
)

dataset = generate_semisynthetic_trajectory_from_intersim(
    InterSIMParams(seed=1203, n_sample=120, cluster_sample_prop=(0.3, 0.3, 0.4)),
    SemiSyntheticTrajectoryParams(
        seed=99,
        trajectory_mode="magnitude",
        group_effect_size=0.2,
        group_ratio=0.5,
        prop_affected_features=0.05,
    ),
)

sample_metadata = dataset.metadata
truth = dataset.truth

Supported trajectory modes:

Mode Injected group-specific pattern
none No group-specific shift; useful for Type I error scenarios
translation Same affected-feature shift in every stage
magnitude Stage-proportional shift along the affected-feature direction
orientation Stage-proportional off-axis shift
shape Non-monotone stage-specific shift; requires at least three stages

Evaluation harness

The evaluation harness runs one SemiSyntheticTrajectoryDataset through MOTCO integration and trajectory testing. It is the per-replicate layer used before larger Type I error or power grids.

from motco.simulations import (
    SimulationEvaluationParams,
    evaluate_semisynthetic_trajectory,
)

evaluation = evaluate_semisynthetic_trajectory(
    dataset,
    SimulationEvaluationParams(
        integration_method="concat",
        permutations=0,
    ),
)

observed_delta = evaluation.pair_statistics["delta"]
truth = evaluation.truth_metadata

Supported integration methods:

Method Behavior
concat Column-binds methylation, expression, and proteomics matrices after deterministic per-feature standardization by default
snf Builds per-omic affinity matrices, fuses them with SNF, and uses spectral embedding as the trajectory outcome matrix

Set permutations=0 for observed statistics only. When permutations > 0, the harness runs RRPP and computes upper-tail empirical p-values with plus-one correction:

p = (1 + count(null >= observed)) / (1 + n_permutations)

The result includes observed delta, angle, and shape matrices, scalar two-group pair statistics, optional p-values, latent matrix metadata, generator truth metadata, runtime metadata, group/stage levels, and the trajectory contrast. Shape pair statistics and p-values are reported as unavailable for datasets with fewer than three stages.

Grid orchestration

The grid orchestration layer enumerates parameter cells, runs local replicates through the evaluation harness, persists one JSONL row per replicate, resumes completed work, and summarizes rejection rates for Type I error or power studies.

from pathlib import Path

from motco.simulations import (
    InterSIMParams,
    SemiSyntheticTrajectoryParams,
    SimulationEvaluationParams,
    SimulationRunConfig,
    enumerate_type_i_grid,
    run_simulation_grid,
    summarize_rejection_rates,
)

grid = enumerate_type_i_grid(
    baseline_intersim_params=InterSIMParams(seed=1, n_sample=60),
    baseline_generator_params=SemiSyntheticTrajectoryParams(seed=2),
    evaluation_params=SimulationEvaluationParams(integration_method="concat", permutations=99),
    axes={
        "intersim.n_sample": [60, 120],
        "generator.group_ratio": [0.5, 0.7],
    },
    n_replicates=3,
    base_seed=2026,
)

records = run_simulation_grid(
    grid,
    config=SimulationRunConfig(output_path=Path("simulation-results.jsonl")),
)
summaries = summarize_rejection_rates(records, alpha=0.05)

Each SimulationCell stores a stable cell_id, phase, InterSIMParams, SemiSyntheticTrajectoryParams, SimulationEvaluationParams, replicate count, base seed, and metadata such as the varied axis. Axis names use explicit namespaces: intersim.<field>, generator.<field>, or evaluation.<field>.

Initial persistence is JSON Lines. Each row records cell and replicate IDs, deterministic seeds, a parameter signature, status, p-values, pair statistics, truth metadata, runtime metadata, cell metadata, and optional error details. With resume=True, completed rows with matching parameter signatures are skipped. A matching cell/replicate with a different parameter signature raises unless overwrite=True.

summarize_rejection_rates groups completed replicate rows by cell and statistic, then reports available replicate count, rejection count, rejection rate, Monte Carlo standard error, and unavailable replicate count. Missing statistics, such as shape p-values for two-stage datasets, remain unavailable rather than being counted as non-significant.

API

motco.simulations.InterSIMParams dataclass

Parameters for the InterSIM R package.

Parameters left as None are omitted from the R call, allowing InterSIM defaults to apply. The covariance parameters currently support InterSIM's native "indep" string option or None.

Source code in src/motco/simulations/intersim.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
@dataclass(frozen=True)
class InterSIMParams:
    """Parameters for the InterSIM R package.

    Parameters left as ``None`` are omitted from the R call, allowing InterSIM
    defaults to apply. The covariance parameters currently support InterSIM's
    native ``"indep"`` string option or ``None``.
    """

    seed: int
    n_sample: int | None = None
    cluster_sample_prop: tuple[float, ...] | None = None
    delta_methyl: float | None = None
    delta_expr: float | None = None
    delta_protein: float | None = None
    p_dmp: float | None = None
    p_deg: float | None = None
    p_dep: float | None = None
    sigma_methyl: str | None = None
    sigma_expr: str | None = None
    sigma_protein: str | None = None
    cor_methyl_expr: float | None = None
    cor_expr_protein: float | None = None

motco.simulations.InterSIMResult dataclass

Normalized InterSIM simulation output.

Source code in src/motco/simulations/intersim.py
74
75
76
77
78
79
80
81
82
83
@dataclass(frozen=True)
class InterSIMResult:
    """Normalized InterSIM simulation output."""

    methylation: pd.DataFrame
    expression: pd.DataFrame
    proteomics: pd.DataFrame
    sample_ids: pd.Index
    clusters: pd.Series
    metadata: dict[str, Any] = field(default_factory=dict)

motco.simulations.InterSIMAvailability dataclass

Availability result for the external R InterSIM dependency.

Source code in src/motco/simulations/intersim.py
40
41
42
43
44
45
46
@dataclass(frozen=True)
class InterSIMAvailability:
    """Availability result for the external R InterSIM dependency."""

    available: bool
    message: str
    rscript_path: str | None = None

motco.simulations.SemiSyntheticTrajectoryParams dataclass

Parameters for semi-synthetic trajectory generation.

group_ratio is the target proportion assigned to the first label in group_labels within every generated stage.

Source code in src/motco/simulations/semisynthetic.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
@dataclass(frozen=True)
class SemiSyntheticTrajectoryParams:
    """Parameters for semi-synthetic trajectory generation.

    ``group_ratio`` is the target proportion assigned to the first label in
    ``group_labels`` within every generated stage.
    """

    seed: int
    trajectory_mode: TrajectoryMode = "none"
    group_effect_size: float = 0.0
    group_ratio: float = 0.5
    group_labels: tuple[str, str] = ("A", "B")
    prop_affected_features: float | Mapping[OmicsLayer, float] = 0.1
    affected_features: Mapping[OmicsLayer, Sequence[str]] | None = None

motco.simulations.SemiSyntheticTrajectoryDataset dataclass

MOTCO-ready semi-synthetic trajectory dataset.

Source code in src/motco/simulations/semisynthetic.py
40
41
42
43
44
45
46
47
48
@dataclass(frozen=True)
class SemiSyntheticTrajectoryDataset:
    """MOTCO-ready semi-synthetic trajectory dataset."""

    methylation: pd.DataFrame
    expression: pd.DataFrame
    proteomics: pd.DataFrame
    metadata: pd.DataFrame
    truth: dict[str, Any] = field(default_factory=dict)

motco.simulations.SimulationEvaluationParams dataclass

Parameters for evaluating one semi-synthetic trajectory dataset.

Source code in src/motco/simulations/evaluation.py
27
28
29
30
31
32
33
34
35
36
37
38
39
@dataclass(frozen=True)
class SimulationEvaluationParams:
    """Parameters for evaluating one semi-synthetic trajectory dataset."""

    integration_method: IntegrationMethod = "concat"
    integration_params: Mapping[str, Any] = field(default_factory=dict)
    permutations: int = 0
    n_jobs: int | None = 1
    seed: int | None = None
    progress: bool = False
    include_null_distributions: bool = False
    group_col: str = "group"
    stage_col: str = "stage"

motco.simulations.SimulationEvaluationResult dataclass

Result from evaluating one semi-synthetic trajectory dataset.

Source code in src/motco/simulations/evaluation.py
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
@dataclass(frozen=True)
class SimulationEvaluationResult:
    """Result from evaluating one semi-synthetic trajectory dataset."""

    observed_deltas: np.ndarray
    observed_angles: np.ndarray
    observed_shapes: np.ndarray
    pair_statistics: dict[str, float]
    p_values: dict[str, float]
    latent_matrix_metadata: dict[str, Any]
    truth_metadata: dict[str, Any]
    runtime_metadata: dict[str, Any]
    evaluation_params: SimulationEvaluationParams
    group_levels: list[str]
    stage_levels: list[str]
    contrast: list[list[int]]
    null_distributions: dict[str, list[float]] | None = None

motco.simulations.SimulationTrajectoryDesign dataclass

Trajectory design objects derived from sample metadata.

Source code in src/motco/simulations/evaluation.py
50
51
52
53
54
55
56
57
58
59
@dataclass(frozen=True)
class SimulationTrajectoryDesign:
    """Trajectory design objects derived from sample metadata."""

    model_full: np.ndarray
    model_reduced: np.ndarray
    ls_means: np.ndarray
    contrast: list[list[int]]
    group_levels: list[str]
    stage_levels: list[str]

motco.simulations.LatentIntegrationResult dataclass

Integrated latent/outcome matrix and metadata.

Source code in src/motco/simulations/evaluation.py
42
43
44
45
46
47
@dataclass(frozen=True)
class LatentIntegrationResult:
    """Integrated latent/outcome matrix and metadata."""

    matrix: pd.DataFrame
    metadata: dict[str, Any]

motco.simulations.SimulationCell dataclass

One simulation parameter cell with one or more replicates.

Source code in src/motco/simulations/grid.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@dataclass(frozen=True)
class SimulationCell:
    """One simulation parameter cell with one or more replicates."""

    cell_id: str
    phase: str
    intersim_params: InterSIMParams
    generator_params: SemiSyntheticTrajectoryParams
    evaluation_params: SimulationEvaluationParams
    n_replicates: int = 1
    base_seed: int = 0
    metadata: Mapping[str, Any] = field(default_factory=dict)

    def __post_init__(self) -> None:
        if self.n_replicates < 1:
            raise SimulationGridError("n_replicates must be at least 1.")
        if not self.cell_id:
            raise SimulationGridError("cell_id must be a non-empty string.")
        if not self.phase:
            raise SimulationGridError("phase must be a non-empty string.")

motco.simulations.SimulationGrid dataclass

Collection of simulation cells.

Source code in src/motco/simulations/grid.py
59
60
61
62
63
64
65
66
67
68
69
70
@dataclass(frozen=True)
class SimulationGrid:
    """Collection of simulation cells."""

    cells: tuple[SimulationCell, ...]
    metadata: Mapping[str, Any] = field(default_factory=dict)

    def __post_init__(self) -> None:
        ids = [cell.cell_id for cell in self.cells]
        duplicates = sorted({cell_id for cell_id in ids if ids.count(cell_id) > 1})
        if duplicates:
            raise SimulationGridError(f"cell_id values must be unique; duplicates: {duplicates}.")

motco.simulations.SimulationReplicateResult dataclass

One persisted row for a simulation cell replicate.

Source code in src/motco/simulations/grid.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
@dataclass(frozen=True)
class SimulationReplicateResult:
    """One persisted row for a simulation cell replicate."""

    cell_id: str
    phase: str
    replicate_index: int
    replicate_seed: int
    intersim_seed: int
    generator_seed: int
    evaluation_seed: int | None
    parameter_signature: str
    status: Literal["completed", "failed"]
    p_values: dict[str, float | None] = field(default_factory=dict)
    pair_statistics: dict[str, float | None] = field(default_factory=dict)
    truth_metadata: dict[str, Any] = field(default_factory=dict)
    runtime_metadata: dict[str, Any] = field(default_factory=dict)
    cell_metadata: dict[str, Any] = field(default_factory=dict)
    error_type: str | None = None
    error_message: str | None = None

motco.simulations.SimulationRunConfig dataclass

Runtime options for local grid execution.

Source code in src/motco/simulations/grid.py
111
112
113
114
115
116
117
118
119
@dataclass(frozen=True)
class SimulationRunConfig:
    """Runtime options for local grid execution."""

    output_path: Path | None = None
    persistence_format: PersistenceFormat = "jsonl"
    resume: bool = True
    overwrite: bool = False
    error_policy: ErrorPolicy = "raise"

motco.simulations.SimulationSummaryResult dataclass

Rejection-rate summary for one cell and statistic.

Source code in src/motco/simulations/grid.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
@dataclass(frozen=True)
class SimulationSummaryResult:
    """Rejection-rate summary for one cell and statistic."""

    cell_id: str
    phase: str
    statistic: str
    alpha: float
    completed_replicates: int
    available_replicates: int
    rejected_replicates: int
    rejection_rate: float | None
    monte_carlo_se: float | None
    unavailable_replicates: int

motco.simulations.check_intersim_available(rscript='Rscript')

Check whether Rscript and the R InterSIM package are available.

Source code in src/motco/simulations/intersim.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def check_intersim_available(rscript: str = "Rscript") -> InterSIMAvailability:
    """Check whether Rscript and the R InterSIM package are available."""

    rscript_path = which(rscript)
    if rscript_path is None:
        return InterSIMAvailability(
            available=False,
            message=f"Rscript dependency is missing or not on PATH: {rscript}",
        )

    cmd = [
        rscript_path,
        "-e",
        'quit(status = if (requireNamespace("InterSIM", quietly = TRUE)) 0 else 1)',
    ]
    proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
    if proc.returncode != 0:
        details = _format_process_output(proc)
        return InterSIMAvailability(
            available=False,
            message="R package InterSIM is not installed or cannot be loaded." + details,
            rscript_path=rscript_path,
        )

    return InterSIMAvailability(
        available=True,
        message="Rscript and R package InterSIM are available.",
        rscript_path=rscript_path,
    )

motco.simulations.run_intersim(params, *, rscript='Rscript', check_dependency=True)

Invoke R InterSIM and return aligned omics matrices plus cluster metadata.

Source code in src/motco/simulations/intersim.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def run_intersim(
    params: InterSIMParams,
    *,
    rscript: str = "Rscript",
    check_dependency: bool = True,
) -> InterSIMResult:
    """Invoke R InterSIM and return aligned omics matrices plus cluster metadata."""

    if check_dependency:
        availability = check_intersim_available(rscript)
        if not availability.available:
            raise InterSIMDependencyError(availability.message)
        rscript_cmd = availability.rscript_path or rscript
    else:
        rscript_cmd = which(rscript) or rscript

    with tempfile.TemporaryDirectory(prefix="motco-intersim-") as tmp:
        output_dir = Path(tmp)
        proc = subprocess.run(
            _build_rscript_command(params, output_dir=output_dir, rscript=rscript_cmd),
            capture_output=True,
            text=True,
            check=False,
        )
        if proc.returncode != 0:
            raise InterSIMRuntimeError(
                f"InterSIM R process failed with exit code {proc.returncode}."
                f"{_format_process_output(proc)}"
            )
        return _load_result(output_dir, params=params)

motco.simulations.generate_semisynthetic_trajectory(intersim_result, params)

Generate a semi-synthetic trajectory dataset from an InterSIM result.

Source code in src/motco/simulations/semisynthetic.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def generate_semisynthetic_trajectory(
    intersim_result: InterSIMResult,
    params: SemiSyntheticTrajectoryParams,
) -> SemiSyntheticTrajectoryDataset:
    """Generate a semi-synthetic trajectory dataset from an InterSIM result."""

    _validate_params(params)
    omics: dict[OmicsLayer, pd.DataFrame] = {
        "methylation": intersim_result.methylation.copy(deep=True),
        "expression": intersim_result.expression.copy(deep=True),
        "proteomics": intersim_result.proteomics.copy(deep=True),
    }
    _validate_intersim_alignment(intersim_result, omics)

    stage_mapping = _build_stage_mapping(intersim_result.clusters)
    metadata = _build_sample_metadata(intersim_result, stage_mapping)
    n_stages = int(metadata["stage"].nunique())
    if params.trajectory_mode == "shape" and n_stages < 3:
        raise SemiSyntheticTrajectoryError("trajectory_mode='shape' requires at least three stages.")

    rng = np.random.default_rng(params.seed)
    metadata["group"] = _assign_groups_within_stages(metadata["stage"], params, rng)

    affected_features = _select_affected_features(omics, params, rng)
    effect_coefficients = _stage_effect_coefficients(params.trajectory_mode, n_stages)
    truth: dict[str, Any] = {
        "trajectory_mode": params.trajectory_mode,
        "group_effect_size": params.group_effect_size,
        "group_labels": list(params.group_labels),
        "group_ratio": params.group_ratio,
        "seed": params.seed,
        "stage_mapping": {str(k): v for k, v in stage_mapping.items()},
        "stage_assumption": "clusters-as-stages",
        "affected_features": affected_features,
        "effect_coefficients": effect_coefficients.tolist(),
        "effect_vectors": {},
        "intersim_metadata": intersim_result.metadata,
    }

    if params.trajectory_mode != "none" and params.group_effect_size != 0:
        for layer in _OMICS_LAYERS:
            effect_vector = _effect_vector_for_layer(
                layer=layer,
                columns=omics[layer].columns,
                affected_features=affected_features[layer],
                mode=params.trajectory_mode,
            )
            _apply_group_effect(
                matrix=omics[layer],
                metadata=metadata,
                params=params,
                effect_coefficients=effect_coefficients,
                effect_vector=effect_vector,
            )
            truth["effect_vectors"][layer] = {
                feature: float(effect_vector.loc[feature])
                for feature in affected_features[layer]
            }
    else:
        truth["effect_vectors"] = {layer: {} for layer in _OMICS_LAYERS}

    _validate_dataset_alignment(omics, metadata)
    return SemiSyntheticTrajectoryDataset(
        methylation=omics["methylation"],
        expression=omics["expression"],
        proteomics=omics["proteomics"],
        metadata=metadata,
        truth=truth,
    )

motco.simulations.generate_semisynthetic_trajectory_from_intersim(intersim_params, trajectory_params, *, rscript='Rscript', check_dependency=True)

Invoke InterSIM and generate a semi-synthetic trajectory dataset.

Source code in src/motco/simulations/semisynthetic.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def generate_semisynthetic_trajectory_from_intersim(
    intersim_params: InterSIMParams,
    trajectory_params: SemiSyntheticTrajectoryParams,
    *,
    rscript: str = "Rscript",
    check_dependency: bool = True,
) -> SemiSyntheticTrajectoryDataset:
    """Invoke InterSIM and generate a semi-synthetic trajectory dataset."""

    intersim_result = run_intersim(
        intersim_params,
        rscript=rscript,
        check_dependency=check_dependency,
    )
    return generate_semisynthetic_trajectory(intersim_result, trajectory_params)

motco.simulations.integrate_semisynthetic_dataset(dataset, params)

Create a latent/outcome matrix from aligned omics layers.

Source code in src/motco/simulations/evaluation.py
167
168
169
170
171
172
173
174
175
176
177
178
179
def integrate_semisynthetic_dataset(
    dataset: SemiSyntheticTrajectoryDataset,
    params: SimulationEvaluationParams,
) -> LatentIntegrationResult:
    """Create a latent/outcome matrix from aligned omics layers."""

    _validate_dataset(dataset, params.group_col, params.stage_col)
    method = params.integration_method
    if method == "concat":
        return _concat_integration(dataset, params.integration_params)
    if method == "snf":
        return _snf_integration(dataset, params.integration_params)
    raise SimulationEvaluationError(f"Unsupported integration_method: {method!r}.")

motco.simulations.build_simulation_trajectory_design(metadata, *, group_col='group', stage_col='stage')

Build full/reduced model matrices, LS means, and two-group contrast.

Source code in src/motco/simulations/evaluation.py
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
def build_simulation_trajectory_design(
    metadata: pd.DataFrame,
    *,
    group_col: str = "group",
    stage_col: str = "stage",
) -> SimulationTrajectoryDesign:
    """Build full/reduced model matrices, LS means, and two-group contrast."""

    _validate_metadata(metadata, group_col, stage_col)
    design_frame = metadata[[group_col, stage_col]].reset_index(drop=True).copy()
    design_frame[group_col] = design_frame[group_col].astype(str)
    design_frame[stage_col] = design_frame[stage_col].astype(str)
    group_levels = sorted(pd.unique(design_frame[group_col]).tolist())
    stage_levels = sorted(pd.unique(design_frame[stage_col]).tolist())
    if len(group_levels) != 2:
        raise SimulationEvaluationError(f"Expected exactly two groups, found {len(group_levels)}: {group_levels}.")
    if len(stage_levels) < 2:
        raise SimulationEvaluationError(
            f"Expected at least two stages, found {len(stage_levels)}: {stage_levels}."
        )
    _validate_group_stage_combinations(design_frame, group_col, stage_col, group_levels, stage_levels)

    model_full = get_model_matrix(design_frame, group_col=group_col, level_col=stage_col, full=True)
    model_reduced = get_model_matrix(design_frame, group_col=group_col, level_col=stage_col, full=False)
    ls_means = build_ls_means(group_levels, stage_levels, full=True)
    n_stages = len(stage_levels)
    contrast = [[group_index * n_stages + stage_index for stage_index in range(n_stages)] for group_index in range(2)]
    return SimulationTrajectoryDesign(
        model_full=model_full,
        model_reduced=model_reduced,
        ls_means=ls_means,
        contrast=contrast,
        group_levels=group_levels,
        stage_levels=stage_levels,
    )

motco.simulations.evaluate_semisynthetic_trajectory(dataset, params=None)

Evaluate one semi-synthetic trajectory dataset through MOTCO routines.

Source code in src/motco/simulations/evaluation.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
def evaluate_semisynthetic_trajectory(
    dataset: SemiSyntheticTrajectoryDataset,
    params: SimulationEvaluationParams | None = None,
) -> SimulationEvaluationResult:
    """Evaluate one semi-synthetic trajectory dataset through MOTCO routines."""

    params = params or SimulationEvaluationParams()
    _validate_evaluation_params(params)
    start = perf_counter()

    latent = integrate_semisynthetic_dataset(dataset, params)
    design = build_simulation_trajectory_design(
        dataset.metadata,
        group_col=params.group_col,
        stage_col=params.stage_col,
    )

    observed_deltas, observed_angles, observed_shapes = estimate_difference(
        Y=latent.matrix,
        model_matrix=design.model_full,
        LS_means=design.ls_means,
        contrast=design.contrast,
    )
    shape_available = len(design.stage_levels) >= 3
    pair_statistics = _extract_pair_statistics(
        observed_deltas,
        observed_angles,
        observed_shapes,
        shape_available=shape_available,
    )

    p_values: dict[str, float] = {}
    null_distributions: dict[str, list[float]] | None = None
    if params.permutations > 0:
        dist_delta, dist_angle, dist_shape = RRPP(
            latent.matrix,
            design.model_full,
            design.model_reduced,
            design.ls_means,
            design.contrast,
            permutations=params.permutations,
            n_jobs=params.n_jobs,
            progress=params.progress,
            seed=params.seed,
        )
        null_values = _extract_null_distributions(
            dist_delta,
            dist_angle,
            dist_shape,
            shape_available=shape_available,
        )
        p_values = {
            statistic: _empirical_p_value(null_values[statistic], observed)
            for statistic, observed in pair_statistics.items()
            if statistic in null_values and np.isfinite(observed)
        }
        if params.include_null_distributions:
            null_distributions = null_values

    runtime_seconds = perf_counter() - start
    runtime_metadata = {
        "runtime_seconds": runtime_seconds,
        "permutations": params.permutations,
        "n_jobs": params.n_jobs,
        "seed": params.seed,
        "progress": params.progress,
        "shape_available": shape_available,
    }

    return SimulationEvaluationResult(
        observed_deltas=observed_deltas,
        observed_angles=observed_angles,
        observed_shapes=observed_shapes,
        pair_statistics=pair_statistics,
        p_values=p_values,
        latent_matrix_metadata=latent.metadata,
        truth_metadata=dict(dataset.truth),
        runtime_metadata=runtime_metadata,
        evaluation_params=params,
        group_levels=design.group_levels,
        stage_levels=design.stage_levels,
        contrast=design.contrast,
        null_distributions=null_distributions,
    )

motco.simulations.enumerate_type_i_grid(*, baseline_intersim_params, baseline_generator_params, evaluation_params=None, axes=None, n_replicates=1, base_seed=0)

Enumerate a null Type I grid with baseline plus one-factor-at-a-time axes.

Source code in src/motco/simulations/grid.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def enumerate_type_i_grid(
    *,
    baseline_intersim_params: InterSIMParams,
    baseline_generator_params: SemiSyntheticTrajectoryParams,
    evaluation_params: SimulationEvaluationParams | None = None,
    axes: Mapping[str, Sequence[Any]] | None = None,
    n_replicates: int = 1,
    base_seed: int = 0,
) -> SimulationGrid:
    """Enumerate a null Type I grid with baseline plus one-factor-at-a-time axes."""

    null_generator = replace(baseline_generator_params, trajectory_mode="none", group_effect_size=0.0)
    cells = [
        make_simulation_cell(
            phase="type_i_baseline",
            intersim_params=baseline_intersim_params,
            generator_params=null_generator,
            evaluation_params=evaluation_params,
            n_replicates=n_replicates,
            base_seed=base_seed,
            metadata={"varied_axis": None, "varied_value": None},
        )
    ]
    for axis, values in (axes or {}).items():
        baseline_value = _get_axis_value(baseline_intersim_params, null_generator, evaluation_params, axis)
        for value in values:
            if _to_jsonable(value) == _to_jsonable(baseline_value):
                continue
            intersim_params, generator_params, eval_params = _apply_axis_value(
                baseline_intersim_params,
                null_generator,
                evaluation_params or SimulationEvaluationParams(),
                axis,
                value,
            )
            cells.append(
                make_simulation_cell(
                    phase="type_i_ofat",
                    intersim_params=intersim_params,
                    generator_params=generator_params,
                    evaluation_params=eval_params,
                    n_replicates=n_replicates,
                    base_seed=base_seed,
                    metadata={"varied_axis": axis, "varied_value": value},
                )
            )
    return SimulationGrid(cells=tuple(cells), metadata={"grid_type": "type_i"})

motco.simulations.enumerate_power_grid(*, baseline_intersim_params, baseline_generator_params, evaluation_params=None, trajectory_modes, effect_sizes, axes=None, n_replicates=1, base_seed=0)

Enumerate power cells from trajectory modes, effect sizes, and optional axes.

Source code in src/motco/simulations/grid.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
def enumerate_power_grid(
    *,
    baseline_intersim_params: InterSIMParams,
    baseline_generator_params: SemiSyntheticTrajectoryParams,
    evaluation_params: SimulationEvaluationParams | None = None,
    trajectory_modes: Sequence[str],
    effect_sizes: Sequence[float],
    axes: Mapping[str, Sequence[Any]] | None = None,
    n_replicates: int = 1,
    base_seed: int = 0,
) -> SimulationGrid:
    """Enumerate power cells from trajectory modes, effect sizes, and optional axes."""

    if not trajectory_modes:
        raise SimulationGridError("trajectory_modes must contain at least one mode.")
    if not effect_sizes:
        raise SimulationGridError("effect_sizes must contain at least one value.")
    cells: list[SimulationCell] = []
    eval_params = evaluation_params or SimulationEvaluationParams()
    for mode, effect_size in itertools.product(trajectory_modes, effect_sizes):
        generator = replace(
            baseline_generator_params,
            trajectory_mode=mode,  # type: ignore[arg-type]
            group_effect_size=float(effect_size),
        )
        cells.append(
            make_simulation_cell(
                phase="power_primary",
                intersim_params=baseline_intersim_params,
                generator_params=generator,
                evaluation_params=eval_params,
                n_replicates=n_replicates,
                base_seed=base_seed,
                metadata={"trajectory_mode": mode, "effect_size": float(effect_size), "varied_axis": None},
            )
        )
        for axis, values in (axes or {}).items():
            baseline_value = _get_axis_value(baseline_intersim_params, generator, eval_params, axis)
            for value in values:
                if _to_jsonable(value) == _to_jsonable(baseline_value):
                    continue
                intersim_params, generator_params, axis_eval_params = _apply_axis_value(
                    baseline_intersim_params,
                    generator,
                    eval_params,
                    axis,
                    value,
                )
                cells.append(
                    make_simulation_cell(
                        phase="power_ofat",
                        intersim_params=intersim_params,
                        generator_params=generator_params,
                        evaluation_params=axis_eval_params,
                        n_replicates=n_replicates,
                        base_seed=base_seed,
                        metadata={
                            "trajectory_mode": mode,
                            "effect_size": float(effect_size),
                            "varied_axis": axis,
                            "varied_value": value,
                        },
                    )
                )
    return SimulationGrid(cells=tuple(cells), metadata={"grid_type": "power"})

motco.simulations.run_simulation_replicate(cell, replicate_index, *, evaluator=None, error_policy='raise')

Run one replicate, using a fake evaluator in tests or the default harness in production.

Source code in src/motco/simulations/grid.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
def run_simulation_replicate(
    cell: SimulationCell,
    replicate_index: int,
    *,
    evaluator: Evaluator | None = None,
    error_policy: ErrorPolicy = "raise",
) -> SimulationReplicateResult:
    """Run one replicate, using a fake evaluator in tests or the default harness in production."""

    _validate_error_policy(error_policy)
    replicate_seed = derive_replicate_seed(cell, replicate_index)
    intersim_params = replace(cell.intersim_params, seed=replicate_seed)
    generator_params = replace(cell.generator_params, seed=replicate_seed)
    evaluation_seed = cell.evaluation_params.seed if cell.evaluation_params.seed is not None else replicate_seed
    evaluation_params = replace(cell.evaluation_params, seed=evaluation_seed)
    evaluator = evaluator or _default_evaluator
    start = perf_counter()
    try:
        result = evaluator(intersim_params, generator_params, evaluation_params)
    except Exception as exc:
        if error_policy == "raise":
            raise
        return SimulationReplicateResult(
            cell_id=cell.cell_id,
            phase=cell.phase,
            replicate_index=replicate_index,
            replicate_seed=replicate_seed,
            intersim_seed=intersim_params.seed,
            generator_seed=generator_params.seed,
            evaluation_seed=evaluation_params.seed,
            parameter_signature=parameter_signature(cell),
            status="failed",
            runtime_metadata={"runtime_seconds": perf_counter() - start},
            cell_metadata=dict(cell.metadata),
            error_type=type(exc).__name__,
            error_message=str(exc),
        )

    return SimulationReplicateResult(
        cell_id=cell.cell_id,
        phase=cell.phase,
        replicate_index=replicate_index,
        replicate_seed=replicate_seed,
        intersim_seed=intersim_params.seed,
        generator_seed=generator_params.seed,
        evaluation_seed=evaluation_params.seed,
        parameter_signature=parameter_signature(cell),
        status="completed",
        p_values=_finite_mapping(result.p_values),
        pair_statistics=_finite_mapping(result.pair_statistics),
        truth_metadata=result.truth_metadata,
        runtime_metadata=result.runtime_metadata,
        cell_metadata=dict(cell.metadata),
    )

motco.simulations.run_simulation_grid(grid, *, config=None, evaluator=None)

Run a grid locally with optional JSONL persistence and resume support.

Source code in src/motco/simulations/grid.py
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
def run_simulation_grid(
    grid: SimulationGrid,
    *,
    config: SimulationRunConfig | None = None,
    evaluator: Evaluator | None = None,
) -> list[SimulationReplicateResult]:
    """Run a grid locally with optional JSONL persistence and resume support."""

    config = config or SimulationRunConfig()
    if config.persistence_format != "jsonl":
        raise SimulationGridError("Only JSONL persistence is currently supported.")
    completed = (
        _completed_index(config.output_path)
        if config.output_path and config.resume and not config.overwrite
        else {}
    )
    records: list[SimulationReplicateResult] = []
    for cell in grid.cells:
        signature = parameter_signature(cell)
        for replicate_index in range(cell.n_replicates):
            key = (cell.cell_id, replicate_index)
            existing_signature = completed.get(key)
            if existing_signature is not None:
                if existing_signature != signature:
                    if not config.overwrite:
                        raise SimulationGridError(
                            f"Existing result for {cell.cell_id} replicate {replicate_index} has a different "
                            "parameter signature. Set overwrite=True to rerun."
                        )
                else:
                    continue
            record = run_simulation_replicate(
                cell,
                replicate_index,
                evaluator=evaluator,
                error_policy=config.error_policy,
            )
            records.append(record)
            if config.output_path is not None:
                append_replicate_results(config.output_path, [record])
    return records

motco.simulations.read_replicate_results(path)

Read replicate records from a JSONL file.

Source code in src/motco/simulations/grid.py
414
415
416
417
418
419
420
421
422
423
424
425
def read_replicate_results(path: Path) -> list[SimulationReplicateResult]:
    """Read replicate records from a JSONL file."""

    if not path.exists():
        return []
    records = []
    with path.open("r", encoding="utf-8") as handle:
        for line in handle:
            if not line.strip():
                continue
            records.append(_replicate_result_from_dict(json.loads(line)))
    return records

motco.simulations.append_replicate_results(path, records)

Append replicate records to a JSONL file.

Source code in src/motco/simulations/grid.py
405
406
407
408
409
410
411
def append_replicate_results(path: Path, records: Iterable[SimulationReplicateResult]) -> None:
    """Append replicate records to a JSONL file."""

    path.parent.mkdir(parents=True, exist_ok=True)
    with path.open("a", encoding="utf-8") as handle:
        for record in records:
            handle.write(json.dumps(_to_jsonable(record), sort_keys=True, allow_nan=False) + "\n")

motco.simulations.summarize_rejection_rates(records, *, alpha=0.05, statistics=('delta', 'angle', 'shape'))

Summarize p-value rejection rates by cell and statistic.

Source code in src/motco/simulations/grid.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
def summarize_rejection_rates(
    records: Sequence[SimulationReplicateResult],
    *,
    alpha: float = 0.05,
    statistics: Sequence[str] = ("delta", "angle", "shape"),
) -> list[SimulationSummaryResult]:
    """Summarize p-value rejection rates by cell and statistic."""

    if not (0 < alpha < 1):
        raise SimulationGridError("alpha must be between 0 and 1.")
    groups: dict[tuple[str, str], list[SimulationReplicateResult]] = {}
    for record in records:
        if record.status != "completed":
            continue
        groups.setdefault((record.cell_id, record.phase), []).append(record)

    summaries: list[SimulationSummaryResult] = []
    for (cell_id, phase), group_records in sorted(groups.items()):
        for statistic in statistics:
            p_values = [record.p_values.get(statistic) for record in group_records]
            available = [float(p) for p in p_values if p is not None and math.isfinite(float(p))]
            rejected = sum(p < alpha for p in available)
            rate = rejected / len(available) if available else None
            se = math.sqrt(rate * (1.0 - rate) / len(available)) if rate is not None else None
            summaries.append(
                SimulationSummaryResult(
                    cell_id=cell_id,
                    phase=phase,
                    statistic=statistic,
                    alpha=alpha,
                    completed_replicates=len(group_records),
                    available_replicates=len(available),
                    rejected_replicates=rejected,
                    rejection_rate=rate,
                    monte_carlo_se=se,
                    unavailable_replicates=len(group_records) - len(available),
                )
            )
    return summaries