Async migration and Zero Downtime
When we have interdependent applications with frappe Migrations of Doctype schemas are often time demanding and cause downtime.
Site Security
There are 2 factor of controls that we use over bench --site <SITE> migrate
- pausing the site (putting it under maintenance)
- pausing scheduler (halting redis and background activities)
Enterprises often have a large number of apps across several organizations, making release frequent and such downtime unacceptable.
These generate a lot of downtime when new updates are released across applications, and frappe does not give any app-specific downtime control as things are too closely linked.
What if we don't Lock?
We will test several scenarios involving schema modifications without halting the site (API layer) to identify points of failure.
Please keep in mind that we cannot resume any background processes. Any failures are more difficult to handle since there is no user-controlled retry.
Let's narrow down the Doctype update scenarios and do some performance and failure tests.
- A new field is being added
- Existing field is being deleted
- Edit an existing field
name
- Edit an existing field
kind
- A child table has been added
Metrics for Testing
We will load test frappe APIs in parallel while migration runs in the background with the following metrics.
100
User Concurrency10
Spawn Rate
A New Field Is Being Added
Result: In the scenario when a new field was attached, the request rate was slowed for that duration, but no obvious faults were identified.
Reports
Migration Time Stamp: 18:37:57
Existing Field Removed
Result: For case where existing field was removed things did not seem to conflict and no errors were found.
Reports
Migration Time Stamp: 18:49:52
Existing Field Renamed
Result: For case where existing field was renamed POST request editing/deleting docs failed with deadlocks, after few error occurrences next consecutive request went successfully.
Reports
Migration Time Stamp: 19:05:37
Existing Field Type Edited
Result: For case where existing field type was changed POST request editing/deleting docs failed with deadlocks, after few error occurrences next consecutive request went successfully.
Reports
Migration Time Stamp: 19:26:01
Child Table Added
Result: For case where existing field type was changed POST request editing/deleting docs failed with deadlocks, after few error occurrences next consecutive request went successfully.
Reports
Migration Time Stamp: 19:54:33
Summary
Subject | Exception | Migration Success | Eventual API Success (On Retry) |
---|---|---|---|
New field being appended | ❌ | ✔️ | ✔️ |
Existing field removed | ❌ | ✔️ | ✔️ |
Existing field name edit |
✔️ | ✔️ | ✔️ |
Existing field type edit |
✔️ | ✔️ | ✔️ |
Child table added | ❌ | ✔️ | ✔️ |
Schema change prediction, pre-migrate and Blue/Green deployments
- Use DocType Editor
- Plan changes in schema such that A/B version of application can keep serving requests
- Generate a SQL patch and pre-migrate it based on the doctype editor changes.
- Release and deploy new version of app.
- This can potentially ensure zero-downtime.