Slow select query in cassandra

Question

I have a cassandra cluster of 3 nodes. There's one table storing approx 400M rows. I am hitting below select query:

SELECT * FROM table_1 WHERE vuid in ('abc','def','ghi');

Above is a sample query. In clause in our production environment has 1000 keys. Below is the table structure

CREATE TABLE dmp.user_profiles_9 (
    vuid text PRIMARY KEY,
    apnid text,
    brand_model text,
    first_seen timestamp,
    ifa text,
    last_seen timestamp,
    msisdn text,
    total_day_count int,
    total_usage_count int,
    user_type text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

With 1000 keys in "in" clause query takes upwards of 5 seconds. We need to keep updating above data set daily. This job involves full table scan. And in order to finish as quick as possible every read should be faster. What could be done in above case?

Slow select query in cassandra

Answers (1)

Related Questions