Partial Outage of DRACOON Cloud

Incident Report for DRACOON Cloud

Postmortem

We experienced an issue with DRACOON Cloud on 2025-03-31 from around 07:30 to 08:15. Our team has worked diligently to identify the root cause and implement a resolution. In this post-mortem, we want to share the details of what happened, why it happened, what we did to resolve it, and what we will do to prevent similar incidents in the future.

What happened?
DRACOON Cloud experienced performance degradation during early usage hours, affecting user access and normal operation.

Why did this happen?
Application containers hit memory limits during high traffic periods, causing automatic restarts and service interruptions as the container orchestration system cycled through unhealthy instances. The memory limits were set too conservatively and hadn't been updated to account for certain traffic spikes.

What did we do?
Our engineering team quickly identified the container restart pattern through application logs and monitoring dashboards. We immediately increased the memory limits for affected services and scaled up the number of container replicas to distribute the load.

What can we do to improve?
We will improve our monitoring, update memory limits based on actual usage patterns, and create automated scaling policies that proactively increase resources before hitting limits.

We apologize for any inconvenience this incident may have caused. We are committed to ensuring the stability and reliability of our services and will continue to take proactive measures to prevent similar incidents from happening in the future.

If you have any questions or concerns, please don't hesitate to reach out to our support team for assistance.

Posted Sep 09, 2025 - 17:18 CEST

Resolved

The issue with DRACOON Cloud has been fully resolved. All systems are now operating normally. We apologize for any inconvenience this may have caused and appreciate your patience. If you continue to experience any issues, please don't hesitate to reach out to our support team for assistance.
Posted Mar 31, 2025 - 10:56 CEST

Monitoring

The issue with DRACOON Cloud has been resolved, and we are monitoring the situation to ensure it remains stable. We apologize for any inconvenience this may have caused and appreciate your patience.
Posted Mar 31, 2025 - 08:11 CEST

Investigating

We are currently investigating an issue with DRACOON Cloud. Our team is working to gather more information and resolve the issue as quickly as possible. We apologize for any inconvenience this may cause and will provide updates as soon as we have them.
Posted Mar 31, 2025 - 07:30 CEST
This incident affected: API (API (group 01), API (group 02), API (group 03), API (group 04), API (group 05), API (group 06), API (group 07), API (group 08), API (group 09)).