overexchange
overexchange

Reputation: 1

Why Cassandra does not give error on inserting duplicate record?

CQL version 5.0.1

Cassandra version 3.11.9

root@3fdb49de030c:/# cqlsh
Connected to Test Cluster at 127.0.0.1:1234.
[cqlsh 5.0.1 | Cassandra 3.11.9 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

On inserting same record(as shown below),

cqlsh> use mykeyspace;
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace> select * from mytable ;

 column1                              | column2                              | column3                         | column4           | column5                      | column6
--------------------------------------+--------------------------------------+---------------------------------+-------------------+---------------------------------+--------------
 54dc8b12-a934-4f2e-8a0d-e7eba3faa47e | 587df674-bc77-481b-b974-ddebd203e371 | 2021-04-02 13:29:37.841000+0000 | [email protected] | 2021-04-02 13:29:37.841000+0000 | [{"abc": "def"}]

(1 rows)
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace> // does not give error on changin non-primary key columns
cqlsh:mykeyspace> INSERT  INTO   mytable (column1, column2, column4, column6, column3, column5)   VALUES (54dc8b12-a934-4f2e-8a0d-e7eba3faa47e,  587df674-bc77-481b-b974-ddebd203e371, '[email protected]', 'xyz',toTimestamp(now()),toTimestamp(now()));
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace> select * from mytable ;
    
     column1                              | column2                              | column3                         | column4           | column5                      | column6
    --------------------------------------+--------------------------------------+---------------------------------+-------------------+---------------------------------+--------------
     54dc8b12-a934-4f2e-8a0d-e7eba3faa47e | 587df674-bc77-481b-b974-ddebd203e371 | 2021-04-02 13:29:37.841000+0000 | [email protected] | 2021-04-02 13:29:37.841000+0000 | xyz
    
    (1 rows)
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace> // does not give error on changing one of the  primary key column
cqlsh:mykeyspace> INSERT  INTO   mytable (column1, column2, column4, column6, column3, column5)   VALUES (54dc8b12-a934-4f2e-8a0d-e7eba3faa47e,  587df674-bc77-481b-b974-ddebd203e372, 'garbage', 'garbage',toTimestamp(now()),toTimestamp(now()));
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace> select * from mytable ; // PK(column1:column2)

 column1                              | column2                              | column3                         | column4           | column5                         | column6
--------------------------------------+--------------------------------------+---------------------------------+---------------+---------------------------------+-----------------------
 54dc8b12-a934-4f2e-8a0d-e7eba3faa47e | 587df674-bc77-481b-b974-ddebd203e371 | 2020-04-02 14:20:24.028000+0000 |    [email protected] | 2020-04-02 14:20:24.028000+0000 |               xyz
 54dc8b12-a934-4f2e-8a0d-e7eba3faa47e | 587df674-bc77-481b-b974-ddebd203e372 | 2020-04-02 14:26:55.101000+0000 |           garbage | 2020-04-02 14:26:55.101000+0000 |           garbage

(2 rows)
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace> DESCRIBE TABLE mytable;

CREATE TABLE mykeyspace.mytable (
    column1 uuid,
    column2 uuid,
    column3 timestamp,
    column4 text,
    column5 timestamp,
    column6 text,
    PRIMARY KEY (column1, column2)
)WITH CLUSTERING ORDER BY (column2 ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cqlsh:mykeyspace>
cqlsh:mykeyspace>
cqlsh:mykeyspace>

Does Cassandra not show error, on inserting duplicate record(with same primary key)? Instead it updates the record

If no, How to error handle duplicate records, upon using gocql for inserting same records? for http POST request... we are yet to use idempotence key

RDBMS SQL server gives error like Cannot insert duplicate key row

Upvotes: 3

Views: 1026

Answers (1)

Aaron
Aaron

Reputation: 57798

Unfortunately, Cassandra is known for its "sharp edges" and this is one of them.

Cassandra doesn't see a difference between an INSERT and an UPDATE. They are essentially the same. The keys provided indicate where the data should be written, and the log-based storage engine puts it there. This means that you actually can perform an update with an INSERT, and an insert with an UPDATE.

For it to work in the way you are describing, Cassandra would have to know if a value already exists for the provided keys. The only way to do that, would be to first read the data at those keys, and then write it. In a concurrent, high-throughput distributed data scenario, that's not safe to do, because another write could happen between the time it takes to finish the read and execute the write.

If you need this behavior, you can invoke a lightweight transaction with the IF [NOT] EXISTS conditional:

INSERT  INTO   mytable (column1, column2, column4, column6, column3, column5)
VALUES (54dc8b12-a934-4f2e-8a0d-e7eba3faa47e,  587df674-bc77-481b-b974-ddebd203e371, '[email protected]', 'xyz',toTimestamp(now()),toTimestamp(now()))
IF NOT EXISTS;

This is essentially executing a read before writing behind-the-scenes, and therefore does have some performance implications.

Upvotes: 5

Related Questions