Extra L1 — Reproducibility, Environments, and Research Pipelines¶

This notebook complements Lecture 1: Setup & Python Syntax.

Goal¶

Move from “Python runs on my laptop” to a replicable research workflow.

What you will do¶

  1. Build a small project structure.
  2. Simulate a reproducibility problem.
  3. Write lightweight validation checks.
  4. Export an environment file and a run log.

Why this matters¶

In research, the key question is not only can you compute a result? but also:

  • can you reproduce it next month,
  • can someone else reproduce it,
  • can you detect when a result silently changed?
In [ ]:
from pathlib import Path
import json
import platform
import random
import sys
from datetime import datetime

PROJECT = Path("extra_l1_project")
for sub in ["data/raw", "data/clean", "output", "src", "logs"]:
    (PROJECT / sub).mkdir(parents=True, exist_ok=True)

print("Project root:", PROJECT.resolve())
print("Python version:", sys.version.split()[0])
print("Platform:", platform.platform())

1. Create a minimal project manifest¶

A manifest is a small file that records assumptions used by a project.

In [ ]:
manifest = {
    "project_name": "extra_track_l1",
    "created_at": datetime.now().isoformat(timespec="seconds"),
    "python_version": sys.version.split()[0],
    "seed": 42,
    "author_role": "student",
    "folders": ["data/raw", "data/clean", "output", "src", "logs"]
}

with open(PROJECT / "project_manifest.json", "w") as f:
    json.dump(manifest, f, indent=2)

with open(PROJECT / "project_manifest.json") as f:
    print(f.read())

2. A reproducibility failure: hidden randomness¶

Below is a toy simulation of estimated treatment effects.
Run the first function several times and observe the output.

In [ ]:
def simulate_effect_unstable(n=500):
    # No explicit seed: output changes every run
    treated = [random.random() < 0.5 for _ in range(n)]
    base = [random.gauss(0, 1) for _ in range(n)]
    outcome = [
        b + 0.3 * int(t) + random.gauss(0, 0.5)
        for b, t in zip(base, treated)
    ]
    treated_mean = sum(y for y, t in zip(outcome, treated) if t) / sum(treated)
    control_mean = sum(y for y, t in zip(outcome, treated) if not t) / (n - sum(treated))
    return treated_mean - control_mean

for _ in range(3):
    print(round(simulate_effect_unstable(), 4))

Task¶

Explain why the result changes and why this is a problem in research code.

In [ ]:
def simulate_effect_stable(n=500, seed=42):
    rng = random.Random(seed)
    treated = [rng.random() < 0.5 for _ in range(n)]
    base = [rng.gauss(0, 1) for _ in range(n)]
    outcome = [
        b + 0.3 * int(t) + rng.gauss(0, 0.5)
        for b, t in zip(base, treated)
    ]
    treated_mean = sum(y for y, t in zip(outcome, treated) if t) / sum(treated)
    control_mean = sum(y for y, t in zip(outcome, treated) if not t) / (n - sum(treated))
    return treated_mean - control_mean

for _ in range(3):
    print(round(simulate_effect_stable(), 4))

3. Save outputs and detect silent changes¶

A simple but powerful habit is to save core results to disk with metadata.

In [ ]:
stable_effect = simulate_effect_stable()

run_record = {
    "timestamp": datetime.now().isoformat(timespec="seconds"),
    "effect_estimate": stable_effect,
    "seed": 42,
    "python_version": sys.version.split()[0]
}

with open(PROJECT / "logs" / "run_record.json", "w") as f:
    json.dump(run_record, f, indent=2)

with open(PROJECT / "logs" / "run_record.json") as f:
    print(f.read())

4. Add a lightweight validation check¶

Research pipelines should fail loudly when assumptions are violated.

In [ ]:
def validate_effect(effect, lower=-0.2, upper=0.8):
    assert lower <= effect <= upper, f"Estimated effect {effect:.3f} outside expected range"

validate_effect(stable_effect)
print("Validation passed.")

5. Environment capture¶

For teaching purposes, the notebook records the core package versions currently in memory. This is not a perfect substitute for a lock file, but it is already useful.

In [ ]:
import importlib

packages = ["json", "platform", "sys", "random"]
versions = {}
for pkg in packages:
    module = importlib.import_module(pkg)
    versions[pkg] = getattr(module, "__version__", "stdlib")

env_snapshot = {
    "python": sys.version.split()[0],
    "packages": versions
}

with open(PROJECT / "environment_snapshot.json", "w") as f:
    json.dump(env_snapshot, f, indent=2)

env_snapshot

Suggested extension¶

  • Add a src/analysis.py script and move the simulation code there.
  • Create a single run_all.py file that produces all outputs from scratch.
  • Add a second validation test checking that the treatment share is close to 0.5.

Takeaway¶

Reproducibility is part of the empirical argument. A result that cannot be reproduced is not yet a finished result.