Posts tagged experiment

From seed to weights: fine-tuning a shell operator

two cycles complete. At archetype-level holdout (n=16, task types absent from training), fine-tuning lifts Mistral termination from 0/16 (base) to 9/16 (tuned), same harness, only the adapter differing. The operate / terminate mechanism generalises to unseen archetypes; task competence (verified 0.31) stays archetype-local. One model, one seed; signal clean.

Read more ...


Scrollback priming: can synthetic history run a shell?

replicated (N=5). Within llama3.1:8b, structure is the lever (0->2->5 clean). Cross-model (6 subjects, 3.8B-8B, non-tool + tool-trained): two axes dissociate. Operation transfers broadly, clean exit is llama-only (2/154 non-llama). Neither scale nor tool-training explains it; leading read is seed-overfit to llama.

Read more ...