Repro steps: request a job (via the “Job creation” endpoint) and right afterwards use the returned job ID to request job details (via the “Get a job” endpoint).
Expected behavior: the latter request always succeeds and return just-created job.
Observed behavior: the latter request sometimes fails with a NOT_FOUND
error.
Detailed example: production job ID 216196376
- Job creation request initiated at
2022-09-20T15:34:27.273+02:00
- 201 success response received at
2022-09-20T15:34:29.026+02:00
- Job details request initiated at
2022-09-20T15:34:29.038+02:00
- 404 error response received at
2022-09-20T15:34:29.164+02:00
Other examples: production job IDs 216223941, 216233368, 216240659, 216241744
Comments: it seems like the job creation endpoint is reporting success prior to the changes actually being fully persisted on Stuart’s end. A slightly slower job creation response time would be preferable to a job creation response that can’t be trusted, as working around the observed behavior introduces a lot of undesirable complexity.