Reputation: 693
I have a column family posts which is used to store post detail of my facebook account. I am using cassandra 2.0.9 and datastax java driver 3.0.
CREATE TABLE posts (
key blob,
column1 text,
value blob,
PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE;
where rowkey is my userid, columnkey is postid, value is post json. Whenever i refresh my application in browser, it'll fetch data from facebook and remove and add data for existing postids. Some times i miss some posts from cassandra. May frequent delete and insert in same column of a row causes data loss? How can i manage this?
Upvotes: 1
Views: 924
Reputation: 8812
It's not really dataloss, if you're updating the same column at a very high frequency (like thousands updates/sec) you may have unpredictable result.
Why ? Because Cassandra is using insert timestamp to determine at read time which value is the right one by comparing the timestamp of the same column from different replicas.
Currently, the resolution of the timestamp is the order of milliseconds so if you update rate is very high, for example 2 update on the same column for the same millisecond, the bigger post JSON will win.
By bigger, I mean by using postJson1.compareTo(postJson2)
. The ordering is determined by the type of your column and in your case it's a String so Cassandra breaks tie by comparing the post JSON data lexicographically.
To avoid this, you can provide the write timestamp at client side by generating yourself an unique timmeuuid().
There are many internatives to generate such TimeUUID, for example by using the Java driver class com.datastax.driver.core.utils.UUIDs.timeBased()
Upvotes: 3