Documentation

Health Checks

Oxmgr health checks are command-based — the daemon runs a command on a polling interval and checks the exit code. Exit 0 = healthy. Any other exit code = failure.

Basic Configuration

After health_max_failures consecutive failures, the process is automatically restarted.

oxfile.toml

[[apps]]
name = "api"
command = "node server.js"
health_cmd = "curl -fsS http://127.0.0.1:3000/health"
health_interval_secs = 15
health_timeout_secs = 3
health_max_failures = 3

Polling interval

health_interval_secs (default: 30s)

Check timeout

health_timeout_secs (default: 5s)

Restart trigger

After health_max_failures (default: 3)

Healthy signal

Exit code 0 = healthy, non-zero = failure

Any Command Works

The health command can be anything that returns an exit code — HTTP checks with curl, database pings, custom scripts, or CLI health checks.

oxfile.toml

[[apps]]
name = "redis"
command = "redis-server"
health_cmd = "redis-cli ping"
health_interval_secs = 10
health_timeout_secs = 2

Common patterns:

curl -fsS http://127.0.0.1:3000/health — HTTP endpoint (fails on non-2xx)
redis-cli ping — Redis connectivity
pg_isready -h localhost — PostgreSQL readiness
./scripts/health-check.sh — custom logic

Readiness Gating During Reload

With wait_ready = true, zero-downtime reloads wait for the new process instance to pass the health check before the old one is stopped. If readiness times out, the reload is aborted — the old process keeps running and serves traffic uninterrupted.

oxfile.toml

[[apps]]
name = "api"
command = "node server.js"
health_cmd = "curl -fsS http://127.0.0.1:3000/health"
wait_ready = true
ready_timeout_secs = 30

Reload sequence with wait_ready

1. Run pre_reload_cmd (if set). Abort on failure.
2. Start new process instance.
3. Poll health_cmd until exit 0 (or ready_timeout_secs exceeded).
4. If ready: stop old process, new one takes over.
5. If timeout: abort reload, old process keeps running.

Field Reference

Field	Default	Description
health_cmd	—	Command to run. Exit 0 = healthy, non-zero = failure.
health_interval_secs	30	Seconds between checks.
health_timeout_secs	5	Max seconds for a single check to complete.
health_max_failures	3	Consecutive failures before auto-restart.
wait_ready	false	Gate reloads on health check readiness. Requires health_cmd.
ready_timeout_secs	30	Timeout for readiness during reload. Abort if exceeded.

Still have questions?

Open an issue or browse the source on GitHub.

Open an Issue ↗