Reputation: 31
I want capture the changed data in Cosmos DB(JSON document) using a schedule job via azure function or other way without impacting the online performance of the DB as web app/ Mobile App are reading and writing the data into Cosmos DB using API call. I want the data for offline ETL and analytical purpose the way oracle provides redo log files offline.
Upvotes: 2
Views: 2399
Reputation: 8763
There are two options to do this via Change Feed depending whether this needs to run as a batch job or can be streamed.
If it doesn't need to be a batch, you can use the built-in Azure Functions triggers. This is the simplest approach but you can only start from the beginning or from when the Azure Function starts. Can get started here, https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-functions
If really does need to be a batch you will need to use Change Feed Processor library and configure the start time to go back to the last datetime the batch was run. Get started here, https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-configure-change-feed-start-time. Sample code here, https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos.Samples/Usage/ChangeFeed/Program.cs
Some caveats to understand about Change Feed in Cosmos. First, it is not a true op-log. It will only show the most recent version (update) of an item in the collection and does not show deletes, you will need to create a "isDeleted" property and set to true for "soft deletes". Second, Change Feed does not tell you what has changed, only that something has changed. Third, Change Feed does consume a small amount of RU/s on the collection it is monitoring and again when you issue the read to pull in the data. But this is small compared to the RU/s for writes. You will want to monitor RU consumption to ensure you leave enough headroom.
Upvotes: 5