Scheduling
The orchestrator schedules modules — it doesn't decide when a module
runs. Each module returns a Schedule struct that the orchestrator
obeys.
type Schedule struct {
Mode ScheduleMode // ScheduleOnce | SchedulePeriodic | ScheduleCron
Interval time.Duration
InitialDelay time.Duration
CronExpr string // ScheduleCron only
}
The three modes
SchedulePeriodic
The default for Killing and Rollout, and an opt-in for
GorillaKill via scenario.when: periodic.
- If
InitialDelay > 0: the module waits, then ticks once immediately. - A
time.Ticker(Interval)then fires everyIntervaluntil cancellation.
Consequence: with wait: 10s and interval: 60s, the first tick is at
t=10s, then t=70s, t=130s, …
ScheduleOnce
Used by GorillaKill with scenario.when: once.
- The module waits
InitialDelay(may be zero). - One tick runs.
- The goroutine exits. The orchestrator reaps it as done.
There is no "retry on failure" for one-shot modules — if the single tick errors, the scenario is considered finished.
ScheduleCron
Available on all periodic modules (Killing, Rollout, GorillaKill
with when: periodic) via scenario.cron.
The expression follows the standard 5-field cron syntax (minute, hour, day-of-month, month, day-of-week):
scenario:
cron: "0 2 * * *" # every day at 02:00
At each tick, the orchestrator computes the next scheduled time using
gronx and sleeps until it arrives.
This is more precise than SchedulePeriodic for wall-clock schedules
like "every night at 2am".
cron and interval are mutually exclusive — validation rejects
both being set.
wait vs interval
For periodic modules, YAML validation enforces:
0 <= wait < interval
This prevents configurations like "wait 5m, interval 1m" that would behave non-obviously.
For GorillaKill when=once, wait is unbounded — "run the first
time in 24 hours" is a valid scenario.
wait has no effect when cron is set — the cron expression already
encodes the exact schedule.
Concurrency model
- One goroutine per module. Independent modules run in parallel.
- Ticks of the same module are serialized. Two ticks never overlap —
the orchestrator takes its mutex for the duration of
Run, and the ticker channel is drained naturally. - A slow
Rundelays future ticks. If a module'sRuntakes longer thanInterval, the next tick fires immediately after, without queueing. This is standardtime.Tickerbehavior — the ticker channel has capacity 1.
Implication: keep Run fast. Use testkit's time.AfterFunc-style
deferral for long waits. Don't block on long operations inside Run.
Graceful shutdown
On SIGINT / SIGTERM:
- The root context is canceled — every
Run(ctx)sees its ctx done. orch.Stop()closes astopChand waits for every module loop to return.- Cross-cutting supervisors (
loadkit.Supervisor,testkit.Runner) drain their in-flight work. - The metrics server shuts down.
Modules that ignore their context will block shutdown — always thread
ctx through API calls.
Startup ordering
Modules start in the order they're declared in YAML (directory-loaded
configs are lexicographically sorted by filename). There is no
inter-module dependency graph — if A must run before B, model that
with wait:, not with a hidden ordering assumption.