Reputation: 2436

Modeling Cassandra data with optional query params

I am developing an application which needs to query for entities based on a primary key, and optionally one or more additional attributes (currently 3 but may increase with time). We are at the database modeling phase and it has been proposed to model the data in Cassandra. I am a newb to NoSQL, and in the small amount of research I've been doing, I'm not convinced it's the right tool for the job and thus can use some feedback.

My primary concern are the optional query parameters. If one needs to query, for example, all car details where the (row key) manufacturer = 'GM', then modeling this in Cassandra is simple. However, if one needs to query based on manufacturer AND maybe 'year' AND mabye 'fuel type' AND maybe 'horsepower'... then how does one model this without constructing several tables consisting of composite keys for all the different permutations of query params? And if the query requirements change over time, do we just keep adding more and more tables?

Upvotes: 0

Answers (2)

Shrikar

Reputation: 870

CREATE TABLE vehicles( manufacturer text, year int, fuel_type text, horse_power int, ..... more columns PRIMARY KEY(manufacturer, year, fuel_type, horse_power)

The above table will help you answering queries like

SELECT * from vehicles where manufacturer = 'GM';
SELECT * from vehicles where manufacturer = 'GM' and year = 2017;
SELECT * from vehicles where manufacturer = 'GM' and year = 2017 and vehicle_type='petrol';
SELECT * from vehicles where manufacturer = 'GM' and year = 2017 and vehicle_type='petrol' and horse_power = 750;

Take a look at Materialized views which can help you with creating multiple tables and updating them https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views

Upvotes: 0

Pandemonium

Reputation: 8390

In the world of NoSQL, denormalization is not a crime, and with Cassandra it's even a good idea to avoid creating indexes.

Cassandra is really optimized for fast writes, meaning creating redundant "lookup" tables and writing to several is fine but it will introduce more headache application-side.

However, to read efficiently, a query should stick to one partition, since rows on two different partitions can be stored on different nodes. This is why denormalization needs to happen. Otherwise, you can always use secondary indexes as a reverse-lookup mechanism on one or two columns, but it doesn't really scale well.

So the answer is yes, you're encourage to add more tables and columns as requirement changes. Cassandra, under the superficial tables, schemas and CQL is very much like a key-value database.

Upvotes: 1

Modeling Cassandra data with optional query params

Answers (2)

Related Questions