I have one table having the columns like below: symbol region country location date count I created this table like below: CREATE TABLE IF NOT EXISTS INFO (symbol varchar, region varchar, country varchar, location varchar, date date,count varint, PRIMARY KEY(symbol,date)); Now I have the query set which needs to supports to this table: select * from info where symbol='AAA'; select * from info where date='2017-01-01'; select * from info where count < 5; select * from info where country='XXX'; select * from info where location='XYZ'; select * from info where region='PQR'; These all queries are not working. In simple words, I want table structure which supports all or any number of columns in the where clause. Is it possible to do this in Cassandra?

Reputation: 2904

How to create table structure in cassandra, which supports any column in where cluase?

I have one table having the columns like below:

symbol
region
country
location
date
count

I created this table like below:

CREATE TABLE IF NOT EXISTS INFO (symbol varchar, region varchar, country varchar, location varchar, date date,count varint, PRIMARY KEY(symbol,date));

Now I have the query set which needs to supports to this table:

select * from info where symbol='AAA';
select * from info where date='2017-01-01';
select * from info where count < 5;
select * from info where country='XXX';
select * from info where location='XYZ';
select * from info where region='PQR';

These all queries are not working.

In simple words, I want table structure which supports all or any number of columns in the where clause.

Is it possible to do this in Cassandra?

Upvotes: 0

Answers (1)

Simon Fontana Oscarsson

Reputation: 2134

It looks to me like you need to learn some Cassandra data modelling. I recommend you go to https://academy.datastax.com/courses and watch some courses (more specifically DS210 and DS220), they're free after a simple registration. This is in my opinion the best way to learn Cassandra. I know they're long but they're incredibly useful.

To answer your question. You always have to specify the partition key (symbol in your case) in your query and this is why: When you insert data Cassandra will hash the primary key and store the data on the node that is responsible for that hash (this is called range). So if you have 1000 nodes in your cluster and you run one of the SELECT queries you specified then how will Cassandra know what node has the data? It is possible to search all nodes for the data you want by using ALLOW FILTERING but as you can imagine this is terrible for performance. Here is a reference for better understanding: https://www.datastax.com/dev/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key

The way to solve this is by creating multiple tables with the same data, but different partition key. Yes, this will make a lot of redundant data but is that really that bad?

The first cost of this will be that you need to buy more disk space. But disk space is cheap so it's not really that big of a problem. CPU is more expensive.

The second cost is that you have to do multiple writes to keep your tables consistent. But compared to SQL databases Cassandra is extremely fast with writing data. Reads are more expensive but that won't matter in your case since you will only read data once anyway.

So how should you do this practically?

In your case you will have to create a new table for each new partition key that you need. That is create 4 new tables with date, country, location and region as partition key.

For the select statement with count < 5 it gets a little more complicated. Like I stated before Cassandra wants to know exactly what partition your data is located in. So making count a partition key won't really help. You need to have a primary key specified in your query as well. Like this:

select * from info where symbol='AAA' AND count < 5;

However, since count isn't a clustering key this won't work either. A clustering is used to sort your data inside of a partition. You can have as many clustering keys as you want in your table. The clustering key is part of the primary key. The first part of the primary key is ALWAYS the partition key. All that comes after are clustering keys.

CREATE TABLE IF NOT EXISTS INFO (symbol varchar, region varchar, country varchar, location varchar, date date,count varint, PRIMARY KEY(symbol,date,count,));

I know this is all confusing for a beginner but just remember that Cassandra is not a SQL database. Try watching some of the videos I linked and read about the different concepts on the Datastax documentation (it's still alot better than the official Cassandra documentation).

Here is a glossary for some of the terms I just used: https://docs.datastax.com/en/glossary/doc/glossary/glossaryTOC.html

Upvotes: 2

How to create table structure in cassandra, which supports any column in where cluase?

Answers (1)

Related Questions