user1189332
user1189332

Reputation: 1941

Cassandra Modelling for Date Range

Cassandra Newbie here. Cassandra v 3.9.

I'm modelling the Travellers Flight Checkin Data.

My Main Query Criteria is Search for travellers with a date range (max of 7 day window).

Here is what I've come up with with my limited exposure to Cassandra.

create table IF NOT EXISTS travellers_checkin (checkinDay text, checkinTimestamp bigint, travellerName text, travellerPassportNo text, flightNumber text, from text, to text, bookingClass text, PRIMARY KEY (checkinDay, checkinTimestamp)) WITH CLUSTERING ORDER BY (checkinTimestamp DESC)

Per day, I'm expecting upto a million records - resulting in the partition to have a million records.

Now my users want search in which the date window is mandatory (max a week window). In this case should I use a IN clause that spans across multiple partitions? Is this the correct way or should I think of re-modelling the data? Alternatively, I'm also wondering if issuing 7 queries (per day) and merging the responses would be efficient.

Upvotes: 0

Views: 170

Answers (1)

Ashraful Islam
Ashraful Islam

Reputation: 12830

Your Data Model Seems Good.But If you could add more field to the partition key it will scale well. And you should use Separate Query with executeAsync

If you are using in clause, this means that you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing

enter image description here

Source : https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

Instead of using IN clause, use separate query of each day and execute it with executeAsync.

Java Example :

PreparedStatement statement = session.prepare("SELECT * FROM travellers_checkin where checkinDay = ? and checkinTimestamp >= ? and checkinTimestamp <= ?");

List<ResultSetFuture> futures = new ArrayList<>();
for (int i = 1; i < 4; i++) {
    ResultSetFuture resultSetFuture = session.executeAsync(statement.bind(i, i));
    futures.add(resultSetFuture);
}

for (ResultSetFuture future : futures){
     ResultSet rows = future.getUninterruptibly();
     //You get the result set of each query, merge them here
}

Upvotes: 3

Related Questions