harrow
harrow

Reputation: 385

Strategy for filtering big data in cassandra db

i am no pro in database designing but i got the lucky task to invent a strategy for filtering data. Given is cassandra as the used dbms. My goal is to invent a caching system which is able to return one field from a dataset for a already requested "configuration" of parameters. If this explicit "configuration" was never requested before, a new query will be triggered in background, but this mechanism is not part of this topic.

example table

id = 1, color = red, size = 10, shape = oval, price = 55.2
id = 2, color = blue, size = 5, shape = rectangle, price = 33.9
id = 3, color = red, size = 2, shape = triangle, price = 95.7

...

The client requests a red and oval shaped item with the size of 10 and wants to know the price. Because the requested parameters are obviously already in the db, the client should get the answer very fast.

What i want is in SQL something like

SELECT price FROM table where color = red AND size = 10 AND shape = oval

In mySQL not very critical, but how to do this in cassandra?

Im new with cassandra and dont know a good solution for this.

What i already know is

Another given condition is, that the parameters are not always the same. For example item 1 has color and size attribute and item 8 has just the shape and color attribute. That speaks for a non-relational architecture in my opinion (?) Fact is, that one "configuration" has to exist exactly one time, so the following example shouldnt be possible

id = 2535, color = red, size = 10, shape = oval, price = 55.2
id = 3224, color = red, size = 10, shape = oval, price = 33.9

I had an idea about building a hash for the parameters an use it as the primary key but the problem is, that for example color = red, size = 10, shape = oval is contentual the same like size = 10, shape = oval, color = red but the hash is a different.

Maybe the solution is very simple or not possible? I searched a while for answers but dont found them.

I dont want a complete solution but maybe someone can give me tip or a keyword to search for.

Thank you all

Upvotes: 0

Views: 107

Answers (1)

Nathan
Nathan

Reputation: 3190

In your example you are performing an equality search on three fields to return a product ID.

Create a table specifically for this query. Since the configurations are unique, identify them as the partition key and cluster on the product_id.

create table product_by_color_shape_size (
    color text,
    size int,
    shape text,
    product_id uuid, //(or int)
    price int,
    PRIMARY KEY ((color, size, shape),product_id)
);

For more information on Data Modeling in Cassandra you should check out the DS220 Data Modeling course at Datastax Academy.

Upvotes: 0

Related Questions