We would like to share more details about the events that occurred with Phrase between 10:02 on April 19, 2024, and 05:14 PM CEST on April 19, 2024, which led to degraded performance of the CAT Web Editor and Termbase services and what Phrase engineers are doing to prevent these issues from reoccurring.
Root Cause
The datastore of the Termbase component was hit by an increased load from a newly introduced check that is intended to alert engineers about suboptimal runtime characteristics. At the same time, the key components of the datastore did not have enough memory to start properly given the size of the environment.
The abovementioned check was rewritten to be much more lightweight, while at the same time the nodes running the datastore were provided with twice as much memory than before.
Firstly, we want to apologize. We know how critical our services are to your business. Phrase as a whole will do everything to learn from this incident and use it to drive improvements across our services. As with any significant operational issue, Phrase engineers will be working tirelessly over the next coming days and weeks to improve their understanding of the incident and determine what changes to make that improve our services and processes.