AlexA
AlexA

Reputation: 4118

Cassandra, implementing high-cardinality indexes

As it is known, Cassandra is great in low-cardinality indexes and not so good with high-cardinality ones. My column family contains a field storing URL value. Naturally, searching for this specific value in a big dataset can be slow.

As a solution, I've come up with idea of taking first characters of url and storing them in separate columns, e.g. test.com/abcd would be stored as (ab, test.com/abcd) columns. So that when a search by specific URL value needs to be done, I can narrow it down by 26*26 times by searching the "ab" first and only then looking up exact url in the obtained resultset.

Does it look like a working solution to reduce URL cardinality in Cassandra?

Upvotes: 0

Views: 490

Answers (2)

kizzx2
kizzx2

Reputation: 19213

A problem with that is that a sequential scan is going to have to follow after you use the low-cardinality index, in order to finally arrive at the one specific URL queried.

As Chris Shain mentioned, you can build a separate column family to build an inverted index:

Column Family 'people'

ssn   | name     | url
----- | ------   | ---
1234  | foo      | http://example.com/1234
5678  | bar      | http://hello.com/world



Column Family 'urls'

url                      | ssn   
------------------------ | ------
http://example.com/1234  | 1234   
http://hello.com/world   | 5678   

The downside is that you need to maintain the integrity of your manual index yourself.

Upvotes: 1

Chris Shain
Chris Shain

Reputation: 51319

If you need this to be really fast, you probably want to consider having a separate table with the value that you are searching for as the column key. Key prefix searches are usually faster than column searches in BigTable implementations.

Upvotes: 2

Related Questions