Skip to content

Async migration and Zero Downtime

When we have interdependent applications with frappe Migrations of Doctype schemas are often time demanding and cause downtime.

Site Security

There are 2 factor of controls that we use over bench --site <SITE> migrate

  • pausing the site (putting it under maintenance)
  • pausing scheduler (halting redis and background activities)

Enterprises often have a large number of apps across several organizations, making release frequent and such downtime unacceptable.

These generate a lot of downtime when new updates are released across applications, and frappe does not give any app-specific downtime control as things are too closely linked.

What if we don't Lock?

We will test several scenarios involving schema modifications without halting the site (API layer) to identify points of failure.

Please keep in mind that we cannot resume any background processes. Any failures are more difficult to handle since there is no user-controlled retry.

Let's narrow down the Doctype update scenarios and do some performance and failure tests.

  • A new field is being added
  • Existing field is being deleted
  • Edit an existing field name
  • Edit an existing field kind
  • A child table has been added

Metrics for Testing

We will load test frappe APIs in parallel while migration runs in the background with the following metrics.

  • 100 User Concurrency
  • 10 Spawn Rate

A New Field Is Being Added

Result: In the scenario when a new field was attached, the request rate was slowed for that duration, but no obvious faults were identified.

Reports

image

Migration Time Stamp: 18:37:57

Complete Report & Request Log

Existing Field Removed

Result: For case where existing field was removed things did not seem to conflict and no errors were found.

Reports image

image

Migration Time Stamp: 18:49:52

Complete Report & Request Log

Existing Field Renamed

Result: For case where existing field was renamed POST request editing/deleting docs failed with deadlocks, after few error occurrences next consecutive request went successfully.

Reports image

image

Migration Time Stamp: 19:05:37

Complete Report & Request Log

Existing Field Type Edited

Result: For case where existing field type was changed POST request editing/deleting docs failed with deadlocks, after few error occurrences next consecutive request went successfully.

Reports image

image

Migration Time Stamp: 19:26:01

Complete Report & Request Log

Child Table Added

Result: For case where existing field type was changed POST request editing/deleting docs failed with deadlocks, after few error occurrences next consecutive request went successfully.

Reports image

image

Migration Time Stamp: 19:54:33

Complete Report & Request Log

Summary

Subject Exception Migration Success Eventual API Success (On Retry)
New field being appended ✔️ ✔️
Existing field removed ✔️ ✔️
Existing field name edit ✔️ ✔️ ✔️
Existing field type edit ✔️ ✔️ ✔️
Child table added ✔️ ✔️

Schema change prediction, pre-migrate and Blue/Green deployments

  • Use DocType Editor
  • Plan changes in schema such that A/B version of application can keep serving requests
  • Generate a SQL patch and pre-migrate it based on the doctype editor changes.
  • Release and deploy new version of app.
  • This can potentially ensure zero-downtime.