Reputation: 117
I am using Cassandra 3.x and have the following table:
CREATE TABLE sp_db.prod_tag
(
year int,
name text,
group int,
sn text,
factory_id bigint,
PRIMARY KEY ((year), name, group)
) WITH CLUSTERING ORDER BY (name ASC, group ASC);
I have inserted 1 million rows and I started to perform queries:
Query #1 - on partition key
select count(*)
from sp_db.prod_tag
where year = 2015;
Output:
count |
------|
33328 |
Query time: ~90ms
Query #2 - one of clustering keys
select count(*)
from sp_db.prod_tag
where group = 104 ALLOW FILTERING;
Result:
count |
------|
3938 |
Query time: ~800ms
We don't pass primary key, therefore Cassandra gets all the rows from the table and it have performance impact.
So far so good...
Querying first from clustering key:
select count(*)
from sp_db.prod_tag
where name = '06d7d' ALLOW FILTERING;
Results:
count |
------|
3 |
Query time: ~6ms
And here is my surprise: why only 6ms? There is no primary key, so this query should have poor performance.
Studying documentation I did not find the answer.
Can anyone explain this?
Upvotes: 0
Views: 752
Reputation: 4426
There's no concrete reason why the query on name
should be faster than the query on group
, assuming you have no secondary index on name
.
I suspect, but can not prove, that it's a primarily the result of caching at various layers (including the OS page cache).
Edit: re-read the schema
The data will be sorted on disk sorted first by name, then group, so within each partition, it’ll be able to skip to the relevant block of data by name - yes, this is expected and normal.
Upvotes: 1