Reputation: 513

Best way to store last-touched time in Cassandra

I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.

Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra? If I should use Cassandra, any tips on table properties?

The table I have in mind:

CREATE TABLE ksp1.user_last_job_activities (
    user_id bigint,
    touched_at timeuuid,
    PRIMARY KEY (user_id, touched_at)
) WITH CLUSTERING ORDER BY (touched_at DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Update

Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.

Like so:

CREATE TABLE simple_user_last_activity (
    user_id bigint,
    touched_at timestamp,
    PRIMARY KEY (user_id)
);

Then:

INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now()));
SELECT touched_at from simple_user_last_activity WHERE user_id = 6;

Since touched_at is no longer in the primary key, only one record per user is stored.

Update 2

There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:

CREATE TABLE final_user_last_job_activities (
    user_id bigint,
    touched_at timestamp,
    job_id bigint,
    PRIMARY KEY (user_id, touched_at)
) 
WITH CLUSTERING ORDER BY (touched_at DESC)
AND default_time_to_live = 604800;

Adding the 1-week TTL takes care of expiring records - if there are none I return current time.

INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);

SELECT * FROM final_user_last_job_activities LIMIT 1;

Which gives me:

 user_id | touched_at               | job_id
---------+--------------------------+--------
       5 | 2015-06-17 12:43:30+1200 |      6

Simple benchmarks show no significant performance difference in storing or reading from the bigger table.

Upvotes: 3

Best way to store last-touched time in Cassandra

Answers (2)

Related Questions