We experienced an issue with the DRACOON cloud on 2023-11-10 09:27 - 10:24. Our team has worked diligently to identify the root cause and implement a resolution. In this post-mortem, we want to share the details of what happened, why it happened, what we did to resolve it, and what we will do to prevent similar incidents in the future.
In order to fix an unrelated problem and improve performance a change has been implemented to some of our components, which resulted in 5xx errors and the experienced cloud outage.
Why did this happen?
The implemented change caused additional services to perform retries in case of an error. This resulted in some of the main components becoming overloaded and therefore resulting in 5xx errors.
What did we do?
We reverted the changes as soon as we realized there were problems.
What can we do to improve?
To prevent this from happening in the future, retries will only be enabled for our main components.
We apologize for any inconvenience this incident may have caused. We are committed to ensuring the stability and reliability of our services and will continue to take proactive measures to prevent similar incidents from happening in the future.
If you have any questions or concerns, please don't hesitate to reach out to our support team for assistance.