Extra L1 — Reproducibility, Environments, and Research Pipelines¶
This notebook complements Lecture 1: Setup & Python Syntax.
Goal¶
Move from “Python runs on my laptop” to a replicable research workflow.
What you will do¶
- Build a small project structure.
- Simulate a reproducibility problem.
- Write lightweight validation checks.
- Export an environment file and a run log.
Why this matters¶
In research, the key question is not only can you compute a result? but also:
- can you reproduce it next month,
- can someone else reproduce it,
- can you detect when a result silently changed?
from pathlib import Path
import json
import platform
import random
import sys
from datetime import datetime
PROJECT = Path("extra_l1_project")
for sub in ["data/raw", "data/clean", "output", "src", "logs"]:
(PROJECT / sub).mkdir(parents=True, exist_ok=True)
print("Project root:", PROJECT.resolve())
print("Python version:", sys.version.split()[0])
print("Platform:", platform.platform())
1. Create a minimal project manifest¶
A manifest is a small file that records assumptions used by a project.
manifest = {
"project_name": "extra_track_l1",
"created_at": datetime.now().isoformat(timespec="seconds"),
"python_version": sys.version.split()[0],
"seed": 42,
"author_role": "student",
"folders": ["data/raw", "data/clean", "output", "src", "logs"]
}
with open(PROJECT / "project_manifest.json", "w") as f:
json.dump(manifest, f, indent=2)
with open(PROJECT / "project_manifest.json") as f:
print(f.read())
2. A reproducibility failure: hidden randomness¶
Below is a toy simulation of estimated treatment effects.
Run the first function several times and observe the output.
def simulate_effect_unstable(n=500):
# No explicit seed: output changes every run
treated = [random.random() < 0.5 for _ in range(n)]
base = [random.gauss(0, 1) for _ in range(n)]
outcome = [
b + 0.3 * int(t) + random.gauss(0, 0.5)
for b, t in zip(base, treated)
]
treated_mean = sum(y for y, t in zip(outcome, treated) if t) / sum(treated)
control_mean = sum(y for y, t in zip(outcome, treated) if not t) / (n - sum(treated))
return treated_mean - control_mean
for _ in range(3):
print(round(simulate_effect_unstable(), 4))
Task¶
Explain why the result changes and why this is a problem in research code.
def simulate_effect_stable(n=500, seed=42):
rng = random.Random(seed)
treated = [rng.random() < 0.5 for _ in range(n)]
base = [rng.gauss(0, 1) for _ in range(n)]
outcome = [
b + 0.3 * int(t) + rng.gauss(0, 0.5)
for b, t in zip(base, treated)
]
treated_mean = sum(y for y, t in zip(outcome, treated) if t) / sum(treated)
control_mean = sum(y for y, t in zip(outcome, treated) if not t) / (n - sum(treated))
return treated_mean - control_mean
for _ in range(3):
print(round(simulate_effect_stable(), 4))
3. Save outputs and detect silent changes¶
A simple but powerful habit is to save core results to disk with metadata.
stable_effect = simulate_effect_stable()
run_record = {
"timestamp": datetime.now().isoformat(timespec="seconds"),
"effect_estimate": stable_effect,
"seed": 42,
"python_version": sys.version.split()[0]
}
with open(PROJECT / "logs" / "run_record.json", "w") as f:
json.dump(run_record, f, indent=2)
with open(PROJECT / "logs" / "run_record.json") as f:
print(f.read())
4. Add a lightweight validation check¶
Research pipelines should fail loudly when assumptions are violated.
def validate_effect(effect, lower=-0.2, upper=0.8):
assert lower <= effect <= upper, f"Estimated effect {effect:.3f} outside expected range"
validate_effect(stable_effect)
print("Validation passed.")
5. Environment capture¶
For teaching purposes, the notebook records the core package versions currently in memory. This is not a perfect substitute for a lock file, but it is already useful.
import importlib
packages = ["json", "platform", "sys", "random"]
versions = {}
for pkg in packages:
module = importlib.import_module(pkg)
versions[pkg] = getattr(module, "__version__", "stdlib")
env_snapshot = {
"python": sys.version.split()[0],
"packages": versions
}
with open(PROJECT / "environment_snapshot.json", "w") as f:
json.dump(env_snapshot, f, indent=2)
env_snapshot
Suggested extension¶
- Add a
src/analysis.pyscript and move the simulation code there. - Create a single
run_all.pyfile that produces all outputs from scratch. - Add a second validation test checking that the treatment share is close to 0.5.
Takeaway¶
Reproducibility is part of the empirical argument. A result that cannot be reproduced is not yet a finished result.