Reputation: 619
I have a scylla table as shown below:
cqlsh:sampleks> describe table test;
CREATE TABLE test (
client_id int,
when timestamp,
process_ids list<int>,
md text,
PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 172800
AND max_index_interval = 1024
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
And I see this is how we are querying it. It's been a long time I worked on cassandra so this PER PARTITION LIMIT
is new thing to me (looks like recently added). Can someone explain what does this do with some example in a layman language? I couldn't find any good doc on that which explains easily.
SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;
Upvotes: 5
Views: 2701
Reputation: 57748
The PER PARTITION LIMIT
clause can be helpful in a "wide partition scenario."
It returns only the first two rows in the partition.
Take this query:
aploetz@cqlsh:stackoverflow> SELECT client_id,when,md
FROM test PER PARTITION LIMIT 2 ;
Considering the PRIMARY KEY definition of (client_id,when)
, that query will iterate over each client_id
. Cassandra will then return only the first two rows (clustered by when
) from that partition, regardless of how many ocurences of when
may be present.
In this case, I inserted 7 rows into your test
table, using two different client_id
s (2 partitions total). Using a PER PARTITION LIMIT
of 2, I get 4 rows returned (2 client_id
x PER PARTITION LIMIT
2) == 4 rows.
client_id | when | md
-----------+---------------------------------+-----
1 | 2020-05-06 12:00:00.000000+0000 | md1
1 | 2020-05-05 22:00:00.000000+0000 | md1
2 | 2020-05-06 19:00:00.000000+0000 | md2
2 | 2020-05-06 01:00:00.000000+0000 | md2
(4 rows)
Upvotes: 6