Reputation: 3262
I tried to understand the secondary Index in Cassandra using the following link:
Let's say we have 5 node N1, N2, N3, N4 and N5 cluster with Replication Factor of 3 which means a partition data will be replicated to 3 nodes in the cluster (say N1, N2 and N3).
Now when I execute this query:
SELECT *
FROM user
WHERE partitionKey = "somedata" AND ClusteringKey = "test";
with the Read consistency as '2'
It will query from any two of the nodes N1, N2 or N3
If I apply a secondary index on any of the column, How many nodes will the following query be executed?
SELECT *
FROM user
WHERE partitionKey = "somedata" AND secondaryKey = "test";
I have two queries in this:
Upvotes: 1
Views: 1452
Reputation: 1865
Cassandra will contact nodes until it reaches the LIMIT of rows to return, that satisfy your query, OR until it contacts all nodes. It does this by first contacting one node on the first round, two nodes on the 2nd round, four nodes on the third-round, and so on, starting with the node that contains the first token.
You can check the complete algorithm in this article (section E): https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive
One thing to look out for when using secondary indexes is if the indexed column has a high cardinality because this will create massive indexes, and hence use a lot of disk space. Avoid using secondary indexes on these columns.
Upvotes: 4
Reputation: 14077
To fill the discussion from comments:
Both up-to-date queries will be executed on two nodes because you're supplying partition key. By doing that Cassandra Query Engine can know in what exact node that data lives.
If you were to run the following query:
SELECT *
FROM user
WHERE secondaryKey = "test";
This would run in all of your nodes that your table has data in and would have to scan each node based on that secondary key.
Like I said, secondary keys are local to node, which means if you'd have users table and your information would look somehow like that:
user_id user_name
---------------------------
1 a_very_cool_user
2 a_very_cooler_user
3 the_coolest_user
So if we'd partition this data into three partitions, assume that each of these three nodes would have one row only:
And if you were to index user_name
field, then node 1 would have indexed just a_very_cool_user and would not know what's in the other two nodes. Same applies to the other ones. That's what local secondary indexes do in Cassandra.
Upvotes: 2