user246392
user246392

Reputation: 3019

CosmosDB: Efficiently migrate records from a large container

I created a container in CosmosDB that tracks metadata about each API call (timestamp, user id, method name, duration, etc.). The partition key is set to UserId and each id is a random Guid. This container also helps me enforce rate limiting for each user. So far so good. Now, I want to periodically clean up this container by moving records to an Azure Table (or something else) for long-term storage and generate reporting. Migrating records also helps me avoid the 20GB logical partition size limit.

However, I have concerns about whether cross-partition queries will bite me eventually. Say, I want to migrate all records that were created a week ago. Also, let's assume I have millions of active users. Thus, this container sees a lot of activity and I can't specify a partition key in my query. I'm reading that we should avoid cross-partition queries when RU/s and storage size are both big. See this. I have no idea how many physical partitions I'm going to end up dealing with in the future.

Is my design completely off? How can I efficiently migrate records? I'm hoping that the CosmosDB team can see this and help me find a solution to this problem.

Upvotes: 2

Views: 362

Answers (2)

Rob Reagan
Rob Reagan

Reputation: 7686

Based on your updated comments:

  • You are writing a CosmosDb doc for each API request.
  • When an API call is made, you are querying CosmosDB for all API calls within a given time period with the partition being the userId. If the document count exceeds the threshold, return an error such as a HTTP 429.
  • You want to store API call information for longterm analysis.

If your API is getting a lot of use from a lot of users, using CosmosDB is going to be expensive to scale, both from a storage and a processing standpoint.

For rate limiting, consider this rate limiting pattern using Redis cache. The StackExchange.Redis package is mature, and has lots of guidance and code samples. It'll be a much lighter weight and scalable solution to your problem.

So for each API call, you would:

  1. Read the Redis key for the user making the call. Check to see if it exceeds your threshold.
  2. Increment the user's Redis key.
  3. Write the API invocation into to Azure Table Storage, probably with the partition key being the userId, and the rowkey being whatever makes sense for you.

Upvotes: 2

4c74356b41
4c74356b41

Reputation: 72171

The easier approach would be to use a time to live and just write events\data to both cosmos db and table storage at the same time, so that it stays in table storage forever, but is gone from Cosmos DB when TTL expires. You can specify TTL at document level, so if you need some documents to live longer - that can be done.

Another approach might be using the change feed.

Upvotes: 2

Related Questions