Reputation: 1659
I am using Casandra 2.0
My write load is somewhat similar to the queueing antipattern mentioned here: datastax
I am looking at pushing 30 - 40GB of data into cassandra every 24 hours and expiring that data within 24 hours. My current approach is to set a TTL on everything that I insert.
I am experimenting with how I partition my data as seen here: cassandra wide vs skinny rows
I have two column families. The first family contains metadata and the second contains data. There are N metadata to 1 data and a metadata may be rewritten M times throughout the day to point to a new data.
I suspect that the metadata churn is causing problems with reads in that finding the right metadata may require scanning all M items.
I suspect that the data churn is leading to excessive work compacting and garbage collecting.
It seems like creating a keyspace for each day and dropping the old keyspace after 24 hours would remove remove the need to do compaction entirely.
Aside from having to handle issues with what keyspace the user reads from on requests that overlap keyspaces, are there any other major flaws with this plan?
Upvotes: 1
Views: 238
Reputation: 29
From my practice using partitioning is much better idea than using ttl.
Upvotes: 1