Foresight - Small percentage of deployments experiencing outage – Incident details

Small percentage of deployments experiencing outage

Resolved
Partial outage
Started about 1 month agoLasted 43 minutes
Updates
  • Resolved
    Resolved

    This issue is considered resolved.

    Impact: A small percentage of Convex deployments experienced sporadic availability for approximately 17 minutes, from 22:42 to 22:59 PM PT.

    Root cause: one of our database clusters experienced a routine primary failover event. During the primary promotion process, the orchestration system failed to provision a replica that was ready to accept new database writes.

    Out of an abundance of caution, Convex will only use database clusters that support semi-sync replication, where at least two machines persist the writes before considering the write successful.

    Since the replicas were not ready, this cluster was not usable by Convex. Human operators were paged to intervene and fix the broken automation. We're working with our database partners to better understand what went wrong and what changes will prevent it in the future.

  • Monitoring
    Monitoring

    A fix has been implemented and we are monitoring the results.

  • Investigating
    Investigating

    Seems related to a problematic database failover. Updates soon.