Job occasionally reported as "not found" immediately after successful job creation

Repro steps: request a job (via the “Job creation” endpoint) and right afterwards use the returned job ID to request job details (via the “Get a job” endpoint).

Expected behavior: the latter request always succeeds and return just-created job.

Observed behavior: the latter request sometimes fails with a NOT_FOUND error.

Detailed example: production job ID 216196376

  • Job creation request initiated at 2022-09-20T15:34:27.273+02:00
  • 201 success response received at 2022-09-20T15:34:29.026+02:00
  • Job details request initiated at 2022-09-20T15:34:29.038+02:00
  • 404 error response received at 2022-09-20T15:34:29.164+02:00

Other examples: production job IDs 216223941, 216233368, 216240659, 216241744

Comments: it seems like the job creation endpoint is reporting success prior to the changes actually being fully persisted on Stuart’s end. A slightly slower job creation response time would be preferable to a job creation response that can’t be trusted, as working around the observed behavior introduces a lot of undesirable complexity.

Hello @mirek,

Thank you for your message.

We are based on a distributed architecture, thus trying to request the job details within the same second of receiving the response may lead to this situation.

However, this should not be an issue as all the information about the job is already in the job creation response. If you want to get job update information you should use webhooks instead.

Unfortunately, we cannot extend the response time because in most cases users need a quick response.

Understood, figured as much. Thanks for the quick response!

1 Like