Support recovery runbooks
These runbooks cover support cases that should not sit silently in Addie, certification, or registry queues.Escalation SLA follow-up
Addie support requests are visible in the admin escalation dashboard and in the requester’s dashboard. The SLA enforcement job runs hourly. Admin follow-up rules:- Urgent open requests are re-surfaced to the configured private escalation Slack channel after 4 hours.
- Other open requests are re-surfaced after 24 hours.
- Acknowledged or in-progress requests are re-surfaced after 24 hours without an update.
- Requesters receive a visible dashboard update after 24 hours and can add details or close the request themselves.
- Open
/admin/escalations. - Filter to active requests and look for “Needs pickup” or “Needs update”.
- Acknowledge or move the request to in progress when a human owns it.
- Add requester-visible notes when the requester needs status.
- Resolve the request when fixed, and notify the user when appropriate.
/admin/settings.
Certification completion recovery
The certification recovery job runs every 6 hours. It looks for passed attempts that did not finish module or credential reconciliation. Automatic behavior:- Reconcile the passed attempt to
learner_progress. - Re-run credential eligibility and awarding.
- File a deduplicated escalation if automatic repair cannot award the credential.
- Open
/admin/certification. - In “Attempts needing attention”, click “Reconcile credentials” for passed attempts.
- If the attempt still has warnings, open the learner detail panel.
- Use “Admin completion repair” only when there is a teaching checkpoint for the learner, module, and Addie thread.
- Run “Issue missing badges” when the local credential exists but Certifier badge fields are missing.
- Record the escalation ID or reason in the note field.
Domain verification recovery
Use admin domain verification recovery when a member has published the WorkOS DNS TXT record but the self-service verify path is blocked, rate-limited, or the webhook did not update local state. This is an internal support workflow. Use authenticated admin domain tooling or the private ops runbook; public docs should not expose admin endpoint paths, bearer-token examples, or WorkOS challenge-token handling. Outcomes:success: true: WorkOS confirmed DNS and localorganization_domainswas reconciled.still_pending: WorkOS still cannot see the TXT record. Ask the member to confirm the record name and token.no_challenge: issue a new domain challenge from the member domain settings or admin domain tooling.
Registry and heartbeat recovery
Use these triggers after a member fixesadagents.json, brand.json, authorization, or endpoint health.
Publisher and brand crawl requests require an authenticated member or admin request and a JSON domain body:
409, 429, or 502 error if monitoring is paused, rate-limited, or the probe fails.
Queue heartbeat without synchronous probing when the agent should be checked by the normal monitor: