Reputation: 4811
Can someone give and show me how the data is layed out when you design your tables for wide vs. skinny rows.
I'm not sure I fully grasp how the data is spread out with a "wide" row.
Is there a difference in how you can fetch the data or will it be the same i.e. if it is ordered it doesn't matter if the data is vertical (skinny) or horizontally (wide) organized.
Update Is a table considered with if the primary key consists of more than one column? Or table will have wide rows only if the partition key is a composite partition key?
Upvotes: 1
Views: 867
Reputation: 5180
Wide... Skinny... Terms that make your head explode... I prefer to oversimplify the thing as such:
This allows me to think this as follow (mangling a bit the C* terminology):
Number of RECORDS in a partition
1 <--------------------------------------- ... 2Billion
^ ^
Skinny rows wide rows
The lesser records in a partition, the skinner is the "partition", and vice-versa.
When designing for C* I always keep in mind a couple of things:
SELECT * FROM table WHERE username = 'xmas79';
where the table has a primary key in the form of PRIMARY KEY (username)
that let me get all the data belonging to a particular username
.SELECT * FROM table WHERE sensor = 'pressure' AND time >= '2016-09-22';
, where the table has a primary key in the form of PRIMARY KEY (sensor, time)
. So, first approach for one shot queries, second approach for range queries. Beware that this second approach have the (major) drawback that you can keep adding data to the partition, and it will get wider and wider, hurting performances.
In order to control how wide your partitions are, you need to add something to the partition key. In the sensor example above, if your don't violate your requirements of course, you can "group" some measurements by date, eg you split the measures in a day-by-day groups, making the primary key like PRIMARY KEY ((sensor, day), time)
, where the partition key was transformed to (sensor, day)
. By this approach, you have full (well, let's say good at least) control on the wideness of your partitions.
You only need to find a good compromise between your query capabilities and the desired performance.
I suggest these three readings for further investigation on the details:
Beware that in the 1. there's a mistake in the second to last picture: the primary key should be
PRIMARY KEY ((user_id, tweet_id))
with double parenthesis around the columns instead of one.
Upvotes: 7