Skip to content

Checkpoint & Resume

An experiment is its on-disk dataset. There is no privileged in-process handle: views and runs rehydrate from disk on demand, so an experiment can be parked and resumed freely.

A checkpoint is a self-contained zip written to the experiment’s own directory. Experiment.Persist saves one; on resume, the workbench mints a fresh handle and the experiment loads its own checkpoint by id — resolving the zip, extracting it, and reading the snapshot back. The same shape backs the CSV rehydrate path.

Two lifecycle operations sit either side of drift:

OperationEffect
ResetEvict the handle so the next use rehydrates from the newest checkpoint — discarding uncommitted drift. The engine half of a preset’s daily reset.
HibernatePersist-then-evict, so the next use resumes the parked state — preserving drift.

Each experiment writes its own experiment.json manifest recording turn count and checkpoint ids. Listing, resolving, and deleting experiments all read these manifests rather than any live process state.