L.Luo
L.Luo

Reputation: 31

Cassandra queries perform a full table scan if no rows exist for a specific partition key

I have a very large table like

CREATE TABLE IF NOT EXISTS profile (
    account_id  text,
    user_id uuid,
    user_data text,
    creation_date timestamp,
    update_date timestamp,,
    PRIMARY KEY ((account_id, user_id))
) WITH bloom_filter_fp_chance = 0.01
   AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
   AND comment = ''
   AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
   AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
   AND crc_check_chance = 1.0
   AND dclocal_read_repair_chance = 0.1
   AND default_time_to_live = 0
   AND gc_grace_seconds = 864000
   AND max_index_interval = 2048
   AND memtable_flush_period_in_ms = 0
   AND min_index_interval = 128
   AND read_repair_chance = 0.0
   AND speculative_retry = '99PERCENTILE';

The following query will run the full table scan if the table has no rows matching the partial partition key (account_id = 'D-F-8CX7PGX')

SELECT * FROM profile WHERE account_id = 'D-F-8CX7PGX' AND user_id = '123e4567-e89b-12d3-a456-426614174000';

I expect that Cassandra could quickly return with no rows found, not scan the full table.

Someone suggested inserting a dummy row with (account_id = 'D-F-8CX7PGX' AND user_id = '00000000-0000-0000-0000-000000000000') could avoid the full table scan. But I don't understand why it is needed.

Does anyone encounter the similar issue?

Upvotes: 1

Views: 336

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16293

A single partition query does not do a full table scan.

Since the partition key is (account_id, user_id) and your query filters on a single partition, Cassandra will attempt to retrieve the partition from the relevant replica(s) without scanning the whole table. Cheers!

Upvotes: 1

Related Questions