Reputation: 479
I am trying to figure out what advantages that a compound partition key can provide. Look at the famous weather station example below.
CREATE TABLE temperature ( state text, city text, event_time timestamp, temperature text, PRIMARY KEY ((state, city),event_time) );
Now, I most of time query into one single state on a set of cities and a range of dates. So the query is like
SELECT * FROM temperature WHERE state = 'NY' AND city IN ('mahattan', 'brooklyn','queens') AND event_time > '2016-01-01'
.
Assuming I have a large data set, in sense that I have a few states (# < 1000) but for each state I have many many cities ( # > 100M). I replicate the data and distribute them into different nodes.
Question: can you compare the differences using
PRIMARY KEY (**(state, city)**,event_time)
PRIMARY KEY (**(city, state)**,event_time)
PRIMARY KEY (state, city,event_time)
PRIMARY KEY (zipcode, event_time)
Thank you!
Upvotes: 1
Views: 38
Reputation: 16576
PRIMARY KEY (**(state, city)**,event_time)
PRIMARY KEY (**(city, state)**,event_time)
Are functionally equivalent. The composite partition key will be the combined values of city and state. You will be unable to fully specify a partition without both portions. Within the partition cells will be ordered by event_time
. You will have #State * #City
Partitions
[city, state] -> [event_time_0, event_time_1, event_time_2, event_time_3, ...]
You will be able to write queries like
SELECT * FROM TABLE WHERE CITY = X AND STATE = Y AND event_time (><=) SomeValue
PRIMARY KEY (state, city,event_time)
One partition is made for every state. This is probably bad since there are on the order of 100x state/provinces which means you will only have a very small number of partitions. Data will be laid out within the partition by city and event_time.
[Illinois] --> [Chicago, 0], [Chicago, 1], [Peoria, 0], [Peoria, 1]
Queries will have to restrict city if they are also restricting event time.
PRIMARY KEY (zipcode, event_time)
You will have up to 10k Partitions, each will have a single cell for each event time.
Upvotes: 1