Cassandra order by multiple columns

Question

I have a table employee in cassandra.

CREATE TABLE employee (
    emp_id       text,
    joining_date TIMESTAMP,
    salary       double,
    first_name   text,
    dept         text,
    last_name    TIMESTAMP,
    PRIMARY KEY (dept,emp_id));

I need capability to sort my cql query results based on different columns. i.e., I need support for all the below mentioned queries. Is there a way that we can achieve this in native cassandra.

select * from employee order by emp_id;
select * from employee order by joining_date;
select * from employee order by salary;
select * from employee order by first_name;
etc.,

Citrullin · Accepted Answer

You haven't order in select statements. You can only set an order in your create statement. The reason for this is simple: Ordering is a performance killer. Cassandras focus is the data writing. This means: Cassandra has a really good performance in writing data to your defined order. Cassandras ordering is based on the primary key. The first part of the primary key is the partition key. The right partition key is really important! All rows with the same partition key are on the same machine. That means: Filtering Rows with the same partition key is a operation with a good performance. Filtering Rows without the same partition is really slow. But you can't use only one or two partition keys. If you doing this, you doesn't use the benefits of cassandra. The other parts of your primary key are the column keys. Cassandra will sort your data in primary key order. In your example cassandra will sort only by emp_id. If you need more than one order, create a new column family (table). In your case, you can create this tables:

employeeByDeptDate (PRIMARY KEY(dept, joining_date))

employeeByDeptSalary (PRIMARY KEY (dept, salary))

employeeByDeptFirstName (PRIMARY KEY (dept, first_name))

employeeByDeptEmp (PRIMARY KEY (dept, emp_id))

Now you will say: What the.. Why i have to create more than one table. Cassandra is a denormalized database. It's not a problem to save your data more than one time. HDD storages are cheap. Cassandra 3.0 has a new feature, called, materialized views. A place where you can manager your duplicated data.

Cassandra order by multiple columns

Answers (2)

Related Questions