Prolonged production downtime / degradation?

mirek · 10 October 2022 18:11

Hello,

Yesterday starting at around 2022-10-10T10:25:00Z (UTC) we started seeing significant performance degradation when trying to request jobs on production, with periodic 503s and 504s, as well as response times longer than 100 seconds (the max timeout duration on our end). The issues continued to persist until around 2022-10-10T11:24:00Z.

Some of the 503 responses included the following message content (note the raw html, as opposed to the expected API error response format):

<h2>This website is under heavy load (queue full)</h2><p>We're sorry, too many people are accessing this website at the same time. We're working on this problem. Please try again later.</p>

Sample 504 response content:

<html> <head><title>504 Gateway Time-out</title></head> <body> <center><h1>504 Gateway Time-out</h1></center> </body> </html>

We’re still diagnosing the magnitude of impact this presumed Stuart downtime had on operations. While we continue to conduct a post mortem, can you please provide some information, confirming the incident on your end and any additional information (e.g. duration, root cause, extent of impact, etc)?

mirek · 11 October 2022 16:23

Hi team, just following up here, awaiting your response so that we can provide some clarity to our own customer base.

Adrien · 11 October 2022 17:12

Hello @mirek,

We apologise for the delayed response.

Indeed, yesterday we experienced issues with our API.

You probably received a notification on the email address you are using with your Stuart account.
If this is not the case you can reach out to cse@stuart.com and we will be able to add you to the list in order to receive such notifications in the future and to receive the postmortem that we will be releasing in the next few days.

For more information on related Tips & Best practices please see our post Incidents & Outages

Thank you for your understanding

mirek · 11 October 2022 17:42

Hi @Adrien, thank you, will do. Looking forward to the post mortem.

mirek · 18 October 2022 18:33

Hi @Adrien, we still haven’t seen any postmortem come through. Has one not been sent yet (in which case, when should we expect this to occur), or should we check to see if there was an issue getting added to the notification mailing list?

Adrien · 19 October 2022 07:50

Hi @mirek,

The postmortem was sent last week.
You are probably not in our list yet. Could you please reach out to cse@stuart.com,
so that we will send you the postmortem and add you to the list.

Thank you in advance

Topic		Replies	Views
Planned Maintenance & Service Degradation Tips & Best Practices	0	729	7 October 2020
Stuart overall slow responses - UK Bug Reporting	2	68	26 April 2024
Can stuart sometimes be in maintenance ? (empty result after UTC+1 20h45 on 2020-12-17) Q&A	1	282	18 December 2020
Expired jobs & no webhook updates Q&A	14	626	2 October 2019
Sandbox environment is down? Bug Reporting	2	19	19 February 2026

Prolonged production downtime / degradation?

Related topics