Partial outage due to external providers

Incident Report for Red Sift UK

Postmortem

Summary

On June 12, 2025, Red Sift experienced service disruptions affecting user authentication and Dynamic DNS functionality. Users were unable to sign up or sign in to the platform, and Dynamic DNS experienced elevated response times and API failures. The incident was caused by a widespread Google Cloud Platform Identity and Access Management (IAM) service outage that impacted multiple GCP products globally.

Root Cause

The incident was caused by a widespread Google Cloud Platform Identity and Access Management (IAM) service outage that began on June 12, 2025. According to Google's incident report, this affected multiple GCP products globally across all regions.

Red Sift's authentication system and Dynamic DNS service (both API and DNS components) depend on Google Cloud Platform services that were impacted by this IAM service disruption.

Impact

User Authentication

  • 100% of sign-in and sign-up attempts failed during the incident
  • Existing authenticated sessions remained functional

Dynamic DNS

  • DNS: 95th percentile response time increased from ~13ms to 245ms during the incident
  • API: Elevated failure rates and degraded performance
  • Both DNS resolution and API components significantly impacted

Business Impact

  • New user acquisitions temporarily halted
  • No data loss or security breaches occurred

Timeline

All times are in BST timezone.

  • 18:55 - Service degradation detected
  • 19:02 - Root cause identified as external cloud provider issue
  • 19:22 - Incident declared on status page: "Partial outage due to external providers"
  • 20:00 - Recovery observed
  • 20:06 - Full service restoration confirmed

Resolution

The incident resolved automatically as Google Cloud Platform restored their IAM services. Red Sift's systems returned to normal operation without requiring any manual intervention on our infrastructure.

Customer Impact Mitigation

  • Existing user sessions remained active throughout the incident
  • All user data and configurations remained intact
  • No security breaches or data loss occurred
  • Transparent communication provided through status page updates

We sincerely apologize for any inconvenience this outage may have caused. While this incident was due to an external cloud provider issue beyond our direct control, we remain committed to providing reliable service to our customers.

Posted Jun 13, 2025 - 13:40 UTC

Resolved

All systems have been restored to normal operation:

- User Authentication: Sign up and sign in functionality is fully operational
- Dynamic DNS API: Service has returned to normal performance levels

Our monitoring confirms that all affected services are now stable and operating as expected.
Posted Jun 12, 2025 - 19:06 UTC

Monitoring

We are pleased to report that all affected systems are returning to normal operation:

- User Authentication: Sign up and sign in functionality has been restored
- Dynamic DNS API: Service levels have returned to normal with failure rates back to baseline

Our monitoring indicates that all systems are now operating normally. We continue to monitor closely to ensure stability.
Posted Jun 12, 2025 - 19:00 UTC

Update

We are continuing to investigate this issue.
Posted Jun 12, 2025 - 18:49 UTC

Investigating

We are currently experiencing service disruptions affecting multiple systems:

- User Authentication: Sign up and sign in functionality is currently unavailable
- Dynamic DNS API: Experiencing elevated failure rates and intermittent connectivity issues

Our engineering team is actively investigating these issues and working to restore full service as quickly as possible. We are monitoring the situation closely and implementing mitigation measures.
Posted Jun 12, 2025 - 17:48 UTC
This incident affected: OnDMARC (Web Application and APIs, Dynamic Services), Red Sift Web Portal, Brand Trust (Web Application and APIs), and Certificates (Web Application and APIs).