Regressions…#
Well, after a nice business line meeting and a lot of claims I made there about the SEK project I’m working on, it was time to reproduce my claims… Well, that didn’t work as expected. It’s been a couple of weeks and I only remembered problems I’ve been having back then. One of the runit daemons I’ve developed for model instance management (llmsv) was behaving slow, so I thought before working on anything else, let me fix that, so that reproducing my findings is smooth sailing.
How wrong I was…
Instrumentation (issue #200)#
Per-stage timing on llmsv bring_up and on login motd-seed
execution (LOG_DEBUG, read back via dmesg).
Completion timing recorded in ccpty.read(), exposed through the
existing usage ioctl, reported on demand by llm status.
Numbers: bring_up ~1.7ms, seed ~15.6ms, first completion ~330-680ms.
Conclusion: the supervisor is not slow. Time-to-bring-up is dominated by
the first completion. “Slow” was never llmsv.
The actual problem (issue #193)#
The model drifts into prose/chat (“Let me try…”, “It seems…”) instead
of issuing shell commands. Commands buried in narration get run as
garbage; sh: It: not found.
Ouch… A regression from my observations and claims I made on behavior.
Scrollback fragmentation: the line editor echoed each keystroke as a
separate user message; cursor/redraw escapes leaked into context.
Fix-1: a ccpty is not a human terminal, skp the visual line editor, use cooked reads. Cleaned the channel. Model still drifted, which exposed the real failure underneath.
Time travel#
Walked git history to find the last known-good state. Tagged the verified
baseline baseline/shell-operator-apr13 (870d9ac) with a repro recipe.
The rootfs is generated by installer.py, not tracked. The seed lives
there as a code constant; checking out old code does not revert config.
Persistent BlobFS journal has one-way schema drift; old code can’t read a
newer DB. Time travel needs rm -rf rootfs + reinstall, not just
checkout.
Observation harness changes per era: cctty->ccpty, scctty->stty, manual
login -f <user> -t <dev> -- + background + stty/scctty scrollback.
Findings#
Apr 13 ran command-only-then-exit. But printf there had no \x/\0
escape (printf '\x1bX' | wc -c = 5), so role-tags were literal text,
never parsed. The “system prompt” was an unparsed user message holding
readable English: “output ONLY commands, no prose.” It worked as an
imperative instruction, not as structured priming.
printf gained ESC (1a5fab2, Apr 23). False-memories seed landed
Apr 24 (0df2ae3). So structured synthetic history only became
functional Apr 24.
Apr 25 (88f9f05, earliest functional false-memories, exact seed,
parsing confirmed) drifts into prose, same as today. The seed code is
unchanged in its drift behavior across the whole window.
Drift hypotheses (settled)#
Candidate reasons the bare seed drifts. Unconfirmed.
It is a pattern to continue (implicit) vs an instruction to obey (explicit). The model is instruction-tuned; the explicit lever may win.
The seed only demonstrates the success path. Drift starts at the first error, out-of-distribution for the seed, and the model may fall back to its trained behavior: explain and fix.
Recency beats the distant seed. One prose turn becomes the recent pattern and self-reinforces. No recovery mechanism.
Synthetic history is NOT shown weak. One config below produced clean command-only + exit from synthetic history alone, no imperative.
The only clean run had the token-identical prompt; removing it regressed. So the prompt looks load-bearing. But n=1 vs n=1, could be variance. Not attributable yet.
Next: replicate each config N times, count clean-vs-drift, before any conclusion. (Done: see the scrollback-priming experiment.)
Lessons#
“Slow” was a symptom of the wrong layer. Instrument before diagnosing. Stop concluding from n=1 at temperature 0.7. Did it repeatedly; every single-run conclusion got flipped by the next run. Replicate first.
An estimate is a region, not a commit. Pinning one commit from a fuzzy memory and reasoning as if it is THE state is an assumption, not evidence.
Falsify, don’t assert. The printf-ESC question was answered by wc -c
and by reading _ESCAPES on the actual commit, not by guessing.
“It doesn’t work here” is not a falsification of the technique; it can mean the mechanism (printf escapes, discipline parsing) wasn’t even present on that commit.
The remembered good behavior was an imperative instruction (as literal text), not false memories. False memories weren’t functional yet then. Prerequisites for the seed to render: printf must emit ESC; the discipline must walk multiple role-tags per write.
Token-identical seeding cuts both ways: matching the prompt also teaches the model to emit the prompt.
Comments
Feel free to leave a public comment on my Regressions… blog post.
Before you comment...
If you don't have an account at accounts.tiararodney.com yet, feel free to create one during sign in, after you've read and agreed to my Privacy and Acceptable Use Policy