Reputation: 20126
Given a distributed system which is persisting records with a primary key being 'url'. Given that multiple servers are collecting data, the 'url' is a handy/convenient and accurate means of guaranteeing uniqueness. Our system queries documents by as frequently as 10,000 times per minute at the moment.
We would like to add another unique key, being a 'uuid' so that we can refer to resources as:
http://example.com/fju98hfhsiu
Rather than, for example:
http://example.com/?u=http%3A%2F%2Fthis.is.a.long.url.com%2Fthis_is%2Fa%2Fpagewitha%2Flong-url.html
It seems that creation of secondary index of UUID's is not ideal in cassandra. Is there any way to avoid creating a secondary index of UUID's in cassandra?
Upvotes: 1
Views: 183
Reputation: 2466
Let's start with the fact, that best practice and the main pattern of Cassandra is to create tables for queries, and not queries for tables, if you need to create index on table, it is "auto" anti pattern. Based on this, the simplest solution is just to use 2 tables with 2 keys.
In your case, the "uuid", is not UUID, it is some concatenation of domain and hash, of the rest of the URL i believe .If your application can generate this key on the time of request, you can just use it as the partition key, and the full URL as clustering key.
Also, if there is no hot domains,(for example http://example.com) you can use the domain as the partition key, and hash and long urls as clustering keys, creating materialized views to support different queries.
In the end, just add secondary index and see performance impact in your specific case. If it works for you, and you don't want do deal with 2 tables, materialized views etc, just use it.
Upvotes: 1