Jay
Jay

Reputation: 20126

Feasability of using UUID as a secondary key for a table of URL's in Cassandra

Given a distributed system which is persisting records with a primary key being 'url'. Given that multiple servers are collecting data, the 'url' is a handy/convenient and accurate means of guaranteeing uniqueness. Our system queries documents by as frequently as 10,000 times per minute at the moment.

We would like to add another unique key, being a 'uuid' so that we can refer to resources as:

http://example.com/fju98hfhsiu

Rather than, for example:

http://example.com/?u=http%3A%2F%2Fthis.is.a.long.url.com%2Fthis_is%2Fa%2Fpagewitha%2Flong-url.html

It seems that creation of secondary index of UUID's is not ideal in cassandra. Is there any way to avoid creating a secondary index of UUID's in cassandra?

Upvotes: 1

Views: 183

Answers (1)

nevsv
nevsv

Reputation: 2466

Let's start with the fact, that best practice and the main pattern of Cassandra is to create tables for queries, and not queries for tables, if you need to create index on table, it is "auto" anti pattern. Based on this, the simplest solution is just to use 2 tables with 2 keys.

In your case, the "uuid", is not UUID, it is some concatenation of domain and hash, of the rest of the URL i believe .If your application can generate this key on the time of request, you can just use it as the partition key, and the full URL as clustering key.

Also, if there is no hot domains,(for example http://example.com) you can use the domain as the partition key, and hash and long urls as clustering keys, creating materialized views to support different queries.

In the end, just add secondary index and see performance impact in your specific case. If it works for you, and you don't want do deal with 2 tables, materialized views etc, just use it.

Upvotes: 1

Related Questions