saimonsez
saimonsez

Reputation: 344

Dealing with one-to-many in Cassandra

Given the below simplified domain model:

Train Journey

Journey Stations

With this relation:

With this query requirements:

How should this query be modelled in Cassandra?

I was thinking about something like:

CREATE TABLE departures_by_station (
    date_of_yourney date,
    train_name varchar,
    station uuid,
    arrival timestamp,
    departure timestamp,
    primary key((date_of_journey, station), train_name, departure)
);

SELECT * from departures_by_station 
WHERE date_of_journey = '2018-01-02' AND station = 'Paris' AND departure ...;

This will not work because it results in a partial Train Journey - all Journey Stations except the requested one are missing. To make it even worse, arrival & departure times may change frequently. I can't update this table with a new departure time because it is a clustering key.

Any idea or hint how to solve this problem? I guess I'm missing something basic here, but I am very new to the nosql world.

Upvotes: 3

Views: 511

Answers (1)

Alex Ott
Alex Ott

Reputation: 87244

First, it maybe not the most optimal solution (need to do some calculations about partition size, etc.). If you can expire "old" data using TTL, then I thought about having something like this:

CREATE TABLE departures_by_station (
   station uuid,
   departure timestamp,
   train_name varchar,
   arrival timestamp,
   statitions list<uuid>
   primary key(station, departure, train_name)
);

In this case you'll have so-called "wide" partitions - one per station of journey, and because you have departure as clustering key, you can do a range search on it. But if you expire quite a lot of data, then you need to perform "repairing" of tables quite often to get rid of tombstones (deletions markers) because this could affect read performance.

Also, to avoid doing multiple lookups, you need to put information about all stations along journey into every row - I wrote this as list<uuid>, but it could be better modeled as user-defined type, so you can include station name, and other information.

You code also should generate an entry for every station along journey for particular train.

P.S. I recommend to take a DS220 (Data Modeling) course on DataStax Academy.

Upvotes: 1

Related Questions