Reputation: 360
I have a table - for simplicity, lets say this is its definition:
CREATE TABLE t (pk1 varchar, pk2 varchar, c1 varchar, c2 varchar, PRIMARY KEY(pk1, pk2));
I do multiple actions on it in parallel using the full PK:
INSERT INTO t (pk1, pk2, c1, c2) values (?, ?, ?, ?) IF NOT EXISTS;
DELETE FROM t where pk1 = ? AND pk2 = ?;
UPDATE t set c1 = ? where pk1 = ? AND pk2 = ? IF EXISTS;
Note:
Using these commands I should never have rows with c2 = null. The problem is that every now and then I do see such rows. I can't easily reproduce it but it always happens when I stress the system (multiple parallel clients running: insert, update, delete with the same PK).
Edit: my cluster size is 4 with RF=2 (NetworkTopologyStrategy with 1 DC) and I use CL=QUORUM for all queries.
Am I missing something or is there a bug in LWT?
Upvotes: 0
Views: 1826
Reputation: 6224
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlLtwtTransactions.html
If lightweight transactions are used to write to a row within a partition,only lightweight transactions for both read and write operations should be used. This caution applies to all operations, whether individual or batched. For example, the following series of operations can fail:
DELETE ...
INSERT .... IF NOT EXISTS
SELECT ....
The following series of operations will work:
DELETE ... IF EXISTS
INSERT .... IF NOT EXISTS
SELECT .....
Note - The same is true for INSERT and UPDATE combination as well from a bug we recently encountered. If you use Transactions then use it for the related statements. The recent could be related to the slightly different timestamps, explained here better
https://jira.apache.org/jira/browse/CASSANDRA-14304
doanduyhai DOAN DuyHai added a comment - 24/Mar/18 15:12
Hints:
1) LWT operations are using the ballot based on an agreement of timestamp value between QUORUM of replicas. It can happens that the timestamp is slightly incremented (some microsecs) in case of conflict/contention on the cluster. The consequence is that the timestamp used for LWT can be slightly (again in microsecs) in the future. Read the source code to check the part of the code responsible for ballot agreement with Paxos
2) For the DELETE request:
a) it can use the <current> timestamp, which can belong to the "past" with respect to the one used by LWT, thus SELECT does return a value
Upvotes: 2
Reputation: 477
In general one option is that the Delete is executed in parallel with the Update yet does not provide any guarantee of being applied.
Example (this is for simplicity - it may be possible in other options as well).
Assume a cluster of 3 nodes with RF=3 (that has a temporary connection issue with node1 and the other node2,node3)
The Delete with CL=ONE is executed toward node1 with timestamp T1 (and not applied on node2, node3).
The Updates is executed toward node2,node3 with timestamp T2 (T2 > T1).
Connection between node1,node2,node3 is restored and now the tombstone that DELETE introduced would remove all the data (including c2) and the UPDATE would only have set pk1,pk2,c1 - leaving c2 as null.
If you would apply the DELETE using LWT - this should not happen as long as TTL is not used.
TTL can be set either directly in the insert statements or by default via a table property, to check this you can execute
Describe table will return the default_time_to_live that is set for this table.
A select ttl(c2) ...
will return a value if ttl was set.
Upvotes: 0