Henrique Mazer
Henrique Mazer

Reputation: 409

Different possible search parameters on cassandra

I'm new to Cassandra. After banging my head against the wall for a few days, things began to make sense, except that.... I always need the PK to perform the query? 🤔

So, I considered the following scenario: let's say I'm building a recipes website with thousands of recipes. Each recipe has a title, a list of ingredients and a type(breakfast, lunch, dessert etc) I want to have a search field where I can lookup recipes using any of these three parameters. I can't have all the 3 parameters as PK because I wouldn't be able to search using only one of them [same if I had one as PK and the others as cluster keys]. I understand that secondary index is not a great idea. So, if I want to be able to query recipes by their ingredients, I'll have to create an ingredients table where I put a recipeId followed by a list of the ingredients on each row, right ?

But then, how would I query by ingredient and sort by rating? Should I add the rating to the ingredients table ? Should I be duplicating data that much?

If I wanted to query both by ingredient and type, I would need to perform two separate queries and compare them ?

For the users table: at login I would need to find user by email. Then, other users would search their friends by their name. So I need a different table for login credentials and for user profile ?

Basically, what I have to do is create lots of tables. Is this expected? Is this advisable ?

Should I somehow ingregrate MySQL and Cassandra ?

Thanks in advance.

Upvotes: 1

Views: 95

Answers (1)

Randy Lynn
Randy Lynn

Reputation: 21

Henrique - you're touching on a lot of the most fundamental concepts of Cassandra (C* heretofore).

1) The Partition Key (as you refer to the PK), whether composite or not is what determines where your data is stored on a C* cluster. The partitioner is what determines how the values of your Partition Key are converted to tokens. Each node in the cluster is responsible for a part of the token range. So when you want to query for some data, by querying by Partition Key, you're essentially instructing C* which node in the ring to get your data from.

Do not think about using Byte Order partitioner for range queries. See this answer. Cassandra ByteOrderedPartitioner

2) Design by query. The book "Cassandra: The Definitive Guide: Distributed Data at Web Scale" has an excellent section on data modeling. It would be time well spent to read one of the C* books on data modeling.

Take for example, a system where you have orders and line items. NOTE: I'm not advocating using Cassandra for an ordering system necessarily, it's just an easy relational model to understand.

Your user wants to get all Orders with the items, so you might build a table like;

CREATE TABLE orders_to_items (o_id uuid, item_id uuid, PRIMARY KEY(o_id));

If you also want to see all the orders that an item has been added to, then you would need/want a separate table;

CREATE TABLE items_on_orders (item_id uuid, o_id uuid, PRIMARY KEY(item_id));

So you can see these two separate queries, end up generating two separate tables.

Anecdotally, here are a couple quick answers for you.

yes - denormalize, denormalize. This is what C* is all about.

do not be tempted by Materialized Views: unless you REALLY understand them, my recommendation is to avoid.

3) For your search item (search by name), you may want to consider an additional tool like Lucene on top of Cassandra to perform the "search" that you speak of. I have seen some good, real-world success with stratio Lucene plugin for Cassandra.

NOTE: I operate a 9-node, 3.11.2 C* cluster in AWS.

Upvotes: 1

Related Questions