Experiments#

The journal is the running log. These are the experiments proper: each one is a standalone, reproducible write-up that someone else (or future me) could re-run and check.

How an experiment is written up#

One file per experiment, same section order, so they are comparable:

  1. Question: the hypothesis, in one or two sentences. What would confirm it, what would falsify it.

  2. Why it matters: the assumption being tested, not the machinery.

  3. Subject under test: model, backend, system version (pin the commit), paradigm, temperature. Everything needed to know what was measured.

  4. Reproduction: exact commands. Copy-paste to stand it up.

  5. Variables: independent (what is changed), dependent (what is observed, with operational definitions), controlled (what is held fixed).

  6. Protocol: how a run is conducted, sample size N, how outcomes are classified. Decide this before running.

  7. Threats to validity: confounds and limits, stated plainly.

  8. Results: the data, tagged with N. Single runs are anecdotes; only rates are results.

  9. Interpretation: tentative, separated from the data.

  10. Status / next: what is settled, what is pending.

  11. Log: dated lab-notebook entries as the experiment progresses.

Two house rules, learned the hard way:

  • No conclusions from n=1. At non-zero temperature a single run flips run-to-run. Replicate, then report the split.

  • Falsify, do not assert. “It did not work here” is not a refutation of the idea; check whether the mechanism was even present.