ravi
ravi

Reputation: 77

How to Model Table to run query's based on status field which is changing over time

Currently we have a table which we query using shipment_id and in future we have a need to query based on status field Current Table :

CREATE TABLE shipment ( 
    shipment_id text,
    tenant_id text,
    actual_arrival_time text,
    actual_dep_time text,
    email_ids set,
    is_deleted boolean,
    modified_by text,
    modified_time timestamp,
    planned_arrival_time text,
    planned_dep_time text,
    route_id text,
    shipment_departure_date text,
    status_code text,
    PRIMARY KEY (shipment_id, tenant_id) 
); 

CREATE INDEX shipment_id_index ON shipment (tenant_id);

Current Query's

1) SELECT * FROM shipment where tenant_id=?0 ALLOW FILTERING ;

2) SELECT * FROM shipment WHERE shipment_id=?0 and tenant_id=?1 ;

Pending/Future Query's

list of shipment id's for given status code as of now 3) SELECT * FROM shipment WHERE tenant_id = 'y' and status_code = x ? ;

4) list of shipment id's for given status code for last 1 week

5) list of shipment id's for which got delayed

Above table There could be 10-15 unique tenants and will have 1 shipment_id,1 tenant_id 1 row per table and status_code will be changing over the time as shipment progresses , from Shipment_started, shipment_progress, shipment_delayed, shipment_delayed_completed and shipment_completed etc each shipment in its life time will go through 3-5 statuses , the current table will be updated only when there is a status change for a given shipment_id .

I need to create a new table which can address query's something like below

3) list of shipment's for a given tenant which have status_code = 'x' as of now

4) list of shipment's for a given tenant which have status_code = 'x' for last 1 week

5) list of shipment's for which got delayed ?

Upvotes: 3

Views: 282

Answers (1)

Mohamed Ibrahim Elsayed
Mohamed Ibrahim Elsayed

Reputation: 2974

In Cassandra, you model your tables based on your queries, so you may actually create a table for every query you may execute. Also using ALLOW FILTERING in your queries is something that should be used for development and testing purposes only and not in your actual prodcution application (check the answer here: Cassandra CQLEngine Allow Filtering).

So for each of the cases/queries you mentioned I suggest the following:

1) SELECT * FROM shipment where tenant_id=?0 ALLOW FILTERING;

this should be addressed by the following table:

CREATE TABLE shipment ( 
    tenant_id text,
    shipment_id text,
    actual_arrival_time text,
    actual_dep_time text,
    email_ids set,
    is_deleted boolean,
    modified_by text,
    modified_time timestamp,
    planned_arrival_time text,
    planned_dep_time text,
    route_id text,
    shipment_departure_date text,
    status_code text,
    PRIMARY KEY (tenant_id, shipment_id) 
);

here the tenant_id is the partition key so if you execute your query: SELECT * FROM shipment where tenant_id='x'; then you don't need to use ALLOW FILTERING any more.

Update: I've also added the shipment_id as part of the primary key to handle same cardinality in case the tenant_id is not unique so that the primary key be consisted of both tenant_id and shipment_id to avoid overwriting records with same tenant_id As per @Himanshu Ahire's comment.

2)SELECT * FROM shipment WHERE shipment_id='x' and tenant_id='y';

this should be addressed by the following table:

CREATE TABLE shipment ( 
    shipment_id text,
    tenant_id text,
    actual_arrival_time text,
    actual_dep_time text,
    email_ids set,
    is_deleted boolean,
    modified_by text,
    modified_time timestamp,
    planned_arrival_time text,
    planned_dep_time text,
    route_id text,
    shipment_departure_date text,
    status_code text,
    PRIMARY KEY ((shipment_id, tenant_id)) 
);

here the shipment_id and tenant_id both are used as composite partition key

3) SELECT * FROM shipment WHERE tenant_id = 'y' and status_code = 'x';

4) list of shipment id's for given status code for last 1 week

5) list of shipment id's for which got delayed

these should be addressed by the following table:

CREATE TABLE shipment (
    status_code text,
    tenant_id text,
    shipment_id text,
    actual_arrival_time text,
    actual_dep_time text,
    email_ids set,
    is_deleted boolean,
    modified_by text,
    modified_time timestamp,
    planned_arrival_time text,
    planned_dep_time text,
    route_id text,
    shipment_departure_date text,
    PRIMARY KEY ((tenant_id, status_code), actual_arrival_time) 
) WITH CLUSTERING ORDER BY (actual_arrival_time DESC);

here you should also use both tenant_id and status_code as composite partition key and the actual_arrival_time as clustering column so you can easily create queries like:

3) SELECT * FROM shipment WHERE tenant_id = 'y' and status_code = 'x';

4) SELECT * FROM shipment WHERE tenant_id = 'y' and status_code = 'x' and actual_arrival_time >= 'date of last week';

5) SELECT * FROM shipment WHERE tenant_id = 'y' and status_code = 'x' and actual_arrival_time > planned_arrival_time;

just a note for query number 4 you can send the date of last week from your application code or using cql functions

Upvotes: 2

Related Questions