Scheduling

The orchestrator schedules modules — it doesn't decide when a module runs. Each module returns a Schedule struct that the orchestrator obeys.

type Schedule struct {
    Mode         ScheduleMode   // ScheduleOnce | SchedulePeriodic | ScheduleCron
    Interval     time.Duration
    InitialDelay time.Duration
    CronExpr     string         // ScheduleCron only
}

The three modes

`SchedulePeriodic`

The default for Killing and Rollout, and an opt-in for GorillaKill via scenario.when: periodic.

If InitialDelay > 0: the module waits, then ticks once immediately.
A time.Ticker(Interval) then fires every Interval until cancellation.

Consequence: with wait: 10s and interval: 60s, the first tick is at t=10s, then t=70s, t=130s, …

`ScheduleOnce`

Used by GorillaKill with scenario.when: once.

The module waits InitialDelay (may be zero).
One tick runs.
The goroutine exits. The orchestrator reaps it as done.

There is no "retry on failure" for one-shot modules — if the single tick errors, the scenario is considered finished.

`ScheduleCron`

Available on all periodic modules (Killing, Rollout, GorillaKill with when: periodic) via scenario.cron.

The expression follows the standard 5-field cron syntax (minute, hour, day-of-month, month, day-of-week):

scenario:
  cron: "0 2 * * *"   # every day at 02:00

At each tick, the orchestrator computes the next scheduled time using gronx and sleeps until it arrives. This is more precise than SchedulePeriodic for wall-clock schedules like "every night at 2am".

cron and interval are mutually exclusive — validation rejects both being set.

`wait` vs `interval`

For periodic modules, YAML validation enforces:

0 <= wait < interval

This prevents configurations like "wait 5m, interval 1m" that would behave non-obviously.

For GorillaKill when=once, wait is unbounded — "run the first time in 24 hours" is a valid scenario.

wait has no effect when cron is set — the cron expression already encodes the exact schedule.

Concurrency model

One goroutine per module. Independent modules run in parallel.
Ticks of the same module are serialized. Two ticks never overlap — the orchestrator takes its mutex for the duration of Run, and the ticker channel is drained naturally.
A slow Run delays future ticks. If a module's Run takes longer than Interval, the next tick fires immediately after, without queueing. This is standard time.Ticker behavior — the ticker channel has capacity 1.

Implication: keep Run fast. Use testkit's time.AfterFunc-style deferral for long waits. Don't block on long operations inside Run.

Graceful shutdown

On SIGINT / SIGTERM:

The root context is canceled — every Run(ctx) sees its ctx done.
orch.Stop() closes a stopCh and waits for every module loop to return.
Cross-cutting supervisors (loadkit.Supervisor, testkit.Runner) drain their in-flight work.
The metrics server shuts down.

Modules that ignore their context will block shutdown — always thread ctx through API calls.

Startup ordering

Modules start in the order they're declared in YAML (directory-loaded configs are lexicographically sorted by filename). There is no inter-module dependency graph — if A must run before B, model that with wait:, not with a hidden ordering assumption.

The three modes​

SchedulePeriodic​

ScheduleOnce​

ScheduleCron​

wait vs interval​

Concurrency model​

Graceful shutdown​

Startup ordering​