Reputation: 385
i am no pro in database designing but i got the lucky task to invent a strategy for filtering data. Given is cassandra as the used dbms. My goal is to invent a caching system which is able to return one field from a dataset for a already requested "configuration" of parameters. If this explicit "configuration" was never requested before, a new query will be triggered in background, but this mechanism is not part of this topic.
example table
id = 1, color = red, size = 10, shape = oval, price = 55.2
id = 2, color = blue, size = 5, shape = rectangle, price = 33.9
id = 3, color = red, size = 2, shape = triangle, price = 95.7
...
The client requests a red and oval shaped item with the size of 10 and wants to know the price. Because the requested parameters are obviously already in the db, the client should get the answer very fast.
What i want is in SQL something like
SELECT price FROM table where color = red AND size = 10 AND shape = oval
In mySQL not very critical, but how to do this in cassandra?
Im new with cassandra and dont know a good solution for this.
What i already know is
filtering in mySQL with the WHERE clause is possible but slow (for big databases) because the db has to run and search over all fields instead using indexes
cassandra cant filter columns which are not indexed or part of the primary key, except using the ALLOW FILTER option which is also as slow as filtering in relational db's
cassandra supports types like maps so it is possible to save a parameterset to one field like
id = 1, properties = {color => red, size => 10, shape => oval}, price = 55.2
but its not possible to use that as a kind of the primary key. As far as i know the indexing of collections is possible but not recommended because its also slow for big enough database
Another given condition is, that the parameters are not always the same. For example item 1 has color and size attribute and item 8 has just the shape and color attribute. That speaks for a non-relational architecture in my opinion (?) Fact is, that one "configuration" has to exist exactly one time, so the following example shouldnt be possible
id = 2535, color = red, size = 10, shape = oval, price = 55.2
id = 3224, color = red, size = 10, shape = oval, price = 33.9
I had an idea about building a hash for the parameters an use it as the primary key but the problem is, that for example color = red, size = 10, shape = oval
is contentual the same like size = 10, shape = oval, color = red
but the hash is a different.
Maybe the solution is very simple or not possible? I searched a while for answers but dont found them.
I dont want a complete solution but maybe someone can give me tip or a keyword to search for.
Thank you all
Upvotes: 0
Views: 107
Reputation: 3190
In your example you are performing an equality search on three fields to return a product ID.
Create a table specifically for this query. Since the configurations are unique, identify them as the partition key and cluster on the product_id.
create table product_by_color_shape_size (
color text,
size int,
shape text,
product_id uuid, //(or int)
price int,
PRIMARY KEY ((color, size, shape),product_id)
);
For more information on Data Modeling in Cassandra you should check out the DS220 Data Modeling course at Datastax Academy.
Upvotes: 0