Vajira Prabuddhaka
Vajira Prabuddhaka

Reputation: 932

Proper way to handle schema changes in MongoDB with java driver

I'm having an application which stores data in a cloud instance of mongoDB. So If I explain further on requirement, I'm currently having data organized at collection level like below.

collection_1 : [{doc_1}, {doc_2}, ... , {doc_n}]
collection_2 : [{doc_1}, {doc_2}, ... , {doc_n}]
...
...
collection_n : [{doc_1}, {doc_2}, ... , {doc_n}]

Note: my collection name is a unique ID to represent collection and in this explanation I'm using collection_1, collection_2 ... to represent that ids.

So I want to change this data model to a single collection model as below. The collection ID will be embedded into document to uniquely identify the data.

global_collection: [{doc_x, collection_id : collection_1}, {doc_y, collection_id : collection_1}, ...]

I'm having the data access layer(data insert, delete, update and create operations) for this application written using Java backend.

Additionally, the entire application is deployed on k8s cluster.

My requirement is to do this migration (data access layer change and existing data migration) with a zero downtime and without impacting any operation in the application. Assume that my application is a heavily used application which has a high concurrent traffic.

What is the proper way to handle this, experts please provide me the guidance..??

For example, if I consider the backend (data access layer) change, I may use a temporary code in java to support both the models and do the migration using an external client. If so, what is the proper way to do the implementation change, is there any specific design patterns for this??

Likewise a complete explanation for this is highly appreciated...

Upvotes: 3

Views: 600

Answers (1)

Tim
Tim

Reputation: 613

I think you have honestly already hinted at the simplest answer.

First, update your data access layer to handle both the new and old schema: Inserts and updates should update both the new and old in order to keep things in sync. Queries should only look at the old schema as it's the source of record at this point.

Then copy all data from the old to the new schema.

Then update the data access to now query the new data. This will keep the old data updated, but will allow full testing of the new data before making any changes that will result in the two sets of data being out of sync. It will also help facilitate rolling updates (ie. applications with both new and old data access code will still function at the same time.

Finally, update the data access layer to only access the new schema and then delete the old data.

Except for this final stage, you can always roll back to the previous version should you encounter problems.

Upvotes: 2

Related Questions