Elevated Error Rates
Incident Report for iFit
Postmortem

Workout API Errors

Date

2019-06-20

Summary

A spike in workout-related API requests consumed a large amount of resources which resulted in an increase in latency and API error rates.

Impact

Website users experienced page load failures on pages that utilize these APIs, including the dashboard. Mobile and embedded console application users experienced page load failures on the library and dashboard screens.

Root Causes

Shortly after a code deploy occurred, we saw a large spike in API calls made to some of our Lambda workout APIs. These Lambdas get workouts from our core API. One particular route in the core API was experiencing especially high throughput and latency. We believe that the heavy load placed on this route was responsible for most of the issues.

Resolution

Caching was added to the problematic route at the nginx layer. The MongoDB cluster that our production workout lambdas connect to was also scaled up to further boost performance.

Action Items

  • A more robust and comprehensive alarming solution is currently in-progress
Posted Jun 20, 2019 - 15:00 MDT

Resolved
This incident has been resolved.
Posted Jun 19, 2019 - 19:58 MDT
Monitoring
We are currently monitoring an issue that caused error rates to increase on workout-related API routes. Users were affected by dashboard and library page load failures. Error rates have now returned to normal. We will continue to monitor this issue and work on improvements to prevent further issues.
Posted Jun 19, 2019 - 17:33 MDT