mluker
mluker

Reputation: 702

Is there a way to use the TTL feature of CosmosDb but always keep at least n records

I have diagnostic data for devices being written to cosmos, some devices write 1000's of messages a day while others write just a few. I always want there to be diagnostics data regardless of when it was added but I don't want to retain all of it forever. Adding a TTL of 90 days works fine for the devices that are very active, they will always have diagnostics data as they are sending it in on a daily basis. The not so active devices will loose their diagnostics logs after the TTL.

Is there a way to use the TTL feature of CosmosDb but always keep at least n records?

I am looking for something like only keeping records from the last last 90 days (TTL) but always keep at least 100 documents regardless of the last updated timestamp.

Upvotes: 2

Views: 824

Answers (2)

NotFound
NotFound

Reputation: 6192

Short anwser: there's no such built in functionality.

You could create your own Function App working on a schedule trigger that fires a query as such:

SELECT * 
FROM c
WHERE NOT IS_DEFINED(c.ttl) --only update items that have no ttl
ORDER BY c._ts DESC
OFFSET 100 LIMIT 2147483647 --skip the newest 100

and then updates the items from it by setting a ttl for them. That way you'll be assured that that the newest 100 records remain available (assuming you don't have another process deleting others), while cleaning up the other items periodically. Keep in mind the update resets the tll as _ts will be updated.

Upvotes: 2

David Makogon
David Makogon

Reputation: 71112

There are no built-in quantity-based filters for TTL: you either have collection-based TTL, or collection+item TTL (item-based TTL overriding default set in the collection).

You'd need to create something yourself, where you'd mark eligible documents for deletion (based on time period, perhaps?), and then run a periodic cleanup routine based on item counts, age of delete-eligible items, etc.

Alternatively, you could treat low-volume and high-volume devices differently, with high-volume device telemetry written to TTL-based collections, and low-volume device telemetry written to no-TTL collections (or something like that)...

tl;dr this isn't something built-in.

Upvotes: 3

Related Questions