Reputation: 431
The question is directed to experienced Cassandra developers. I need to count how many times and when each user accessed some resource. I have data structure like this (CQL):
CREATE TABLE IF NOT EXISTS access_counter_table (
access_number counter,
resource_id varchar,
user_id varchar,
dateutc varchar,
PRIMARY KEY (user_id, dateutc, resource_id)
);
I need to get an information about how many times user has accessed resources for last N days. So, to get last 7 days I make requests like this:
SELECT * FROM access_counter_table
WHERE
user_id = 'user_1'
AND dateutc > '2015-04-03'
AND dateutc <= '2015-04-10' ;
And I get something like this:
user_1 : 2015-04-10 : [resource1:1, resource2:4]
user_1 : 2015-04-09 : [resource1:3]
user_1 : 2015-04-08 : [resource1:1, resource3:2]
...
So, my problem is: old data must be deleted after some time, but Cassandra does not allow set EXPIRE TTL to counter tables.
I have millions of access events per hour (and it could billions). And after 7 days those records will be useless.
Thanks.
Upvotes: 1
Views: 617
Reputation: 4426
As you've found, Cassandra does not support TTLs on Counter columns. In fact, deletes on counters in Cassandra are problematic in general (once you delete a counter, you essentially cannot reuse it for a while).
If you need automatic expiration, you can model it using an int field, and perhaps use external locking (such as zookeeper), request routing (only allow one writer to access a particular partition), or Lightweight transactions to safely increment that integer field with a TTL.
Alternatively, you can page through the table of counters and remove "old" counters manually with DELETE on a scheduled task. This is less elegant, and doesn't scale as well, but may work in some cases.
Upvotes: 2