Henrik
Henrik

Reputation: 169

Two different queries in one line in Cassandra

I have two tables in Cassandra, the first one is "company", where columns are "company_name (string) and company_id (int)" and the second one is "data", where columns are "company_id (int), worker (string) and status (boolean)".

I need to select all workers and their statuses from Data, where company_id=1, but I have to get company_id by using company_name, so I'm executing the first query to get company_id from first table (select company_id from company where compani_name=name1') and I'm getting 1 as query result, e.g.. After that I'm using result of that query for another query to get "worker" and "status" from Data table (select worker from data where company_id=1) and (select status from data where company_id=1).

SO, I would like to know, is there any way to execute these two queries in one line in Cassandra?

Thank you

Upvotes: 0

Views: 1243

Answers (1)

fromanator
fromanator

Reputation: 864

As pointed out in a comment to your question, with your existing data model you are essentially asking how to do a join. However Cassandra doesn't support joins, and it won't (probably ever) in the foreseeable future due to it's design and implementation choices.

Unlike a Relational Database where you design the tables first, then write your queries, in Cassandra it's best to define your queries then create tables that can support those queries. This means you may have several denormalized tables of duplicate data, but Cassandra was designed on the idea of disk is cheap.

If you wanted to keep with the tables you have now, you would want to look into Apache Spark with the Spark Cassandra Connector which provides a computing framework for doing joins on Cassandra tables.

Instead of using Spark, you could create a new denormalized table like this:

CREATE TABLE workers_by_company_name (
  company_name text PRIMARY KEY,
  company_id int,
  worker text,
  status boolean
)

Then you can get all of the information you need with a SELECT * FROM workers_by_company_name WHERE company_name = 'your_company';

And if you were to keep the two other tables, you would want to do an Atomic Batch with 3 insert statements (one for each table) whenever adding or changing this data.

Additional Reading for Data Modeling: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

Upvotes: 1

Related Questions