Reputation: 4118
As it is known, Cassandra is great in low-cardinality indexes and not so good with high-cardinality ones. My column family contains a field storing URL value. Naturally, searching for this specific value in a big dataset can be slow.
As a solution, I've come up with idea of taking first characters of url and storing them in separate columns, e.g. test.com/abcd would be stored as (ab, test.com/abcd) columns. So that when a search by specific URL value needs to be done, I can narrow it down by 26*26 times by searching the "ab" first and only then looking up exact url in the obtained resultset.
Does it look like a working solution to reduce URL cardinality in Cassandra?
Upvotes: 0
Views: 490
Reputation: 19213
A problem with that is that a sequential scan is going to have to follow after you use the low-cardinality index, in order to finally arrive at the one specific URL queried.
As Chris Shain mentioned, you can build a separate column family to build an inverted index:
Column Family 'people'
ssn | name | url
----- | ------ | ---
1234 | foo | http://example.com/1234
5678 | bar | http://hello.com/world
Column Family 'urls'
url | ssn
------------------------ | ------
http://example.com/1234 | 1234
http://hello.com/world | 5678
The downside is that you need to maintain the integrity of your manual index yourself.
Upvotes: 1
Reputation: 51319
If you need this to be really fast, you probably want to consider having a separate table with the value that you are searching for as the column key. Key prefix searches are usually faster than column searches in BigTable implementations.
Upvotes: 2