Reputation: 513
I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.
Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra? If I should use Cassandra, any tips on table properties?
The table I have in mind:
CREATE TABLE ksp1.user_last_job_activities (
user_id bigint,
touched_at timeuuid,
PRIMARY KEY (user_id, touched_at)
) WITH CLUSTERING ORDER BY (touched_at DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Update
Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.
Like so:
CREATE TABLE simple_user_last_activity (
user_id bigint,
touched_at timestamp,
PRIMARY KEY (user_id)
);
Then:
INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now()));
SELECT touched_at from simple_user_last_activity WHERE user_id = 6;
Since touched_at is no longer in the primary key, only one record per user is stored.
Update 2
There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:
CREATE TABLE final_user_last_job_activities (
user_id bigint,
touched_at timestamp,
job_id bigint,
PRIMARY KEY (user_id, touched_at)
)
WITH CLUSTERING ORDER BY (touched_at DESC)
AND default_time_to_live = 604800;
Adding the 1-week TTL takes care of expiring records - if there are none I return current time.
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
SELECT * FROM final_user_last_job_activities LIMIT 1;
Which gives me:
user_id | touched_at | job_id
---------+--------------------------+--------
5 | 2015-06-17 12:43:30+1200 | 6
Simple benchmarks show no significant performance difference in storing or reading from the bigger table.
Upvotes: 3
Views: 4395
Reputation: 239
Cassandra has implicit support for writetime
per each column. See this, looks like that is what you are looking for here.
Upvotes: 1
Reputation: 7305
Because c* is last write wins, you can simply keep the latest versions of each row.
You could, as MSD suggests, use writetime
to pull the time of the write. But be careful because this is column specific and you can't use write time on your primary key columns. For example in a table as follows:
cqlsh> create TABLE test.test ( a int, b int, c int, d int, primary key (a))
... ;
cqlsh> insert INTO test.test (a, b, c, d) VALUES ( 1,2,3,4)
... ;
cqlsh> select * from test.test
... ;
a | b | c | d
---+------+---+------
1 | 2 | 3 | 4
(2 rows)
cqlsh> insert into test.test (a,c) values (1, 6);
cqlsh> select * from test.test ;
a | b | c | d
---+------+---+------
1 | 2 | 6 | 4
(2 rows)
cqlsh> select writetime(a), writetime(b), writetime(c), writetime(d) from test.test
... ;
InvalidRequest: code=2200 [Invalid query] message="Cannot use selection function writeTime on PRIMARY KEY part a"
cqlsh> select writetime(b), writetime(c), writetime(d) from test.test ;
writetime(b) | writetime(c) | writetime(d)
------------------+------------------+------------------
1434424690700887 | 1434424690700887 | 1434424702420929
Otherwise you can add a cql column with the timestamp:
create TABLE test.test ( a int, b int, c int, d int, touched_at timeuuid, primary key (a)) ;
Some quick benchmarking would help you determine which is more performant.
Upvotes: 4