cool breeze
cool breeze

Reputation: 4811

Trying to visual how wide and skinny rows are layed out

Can someone give and show me how the data is layed out when you design your tables for wide vs. skinny rows.

I'm not sure I fully grasp how the data is spread out with a "wide" row.

Is there a difference in how you can fetch the data or will it be the same i.e. if it is ordered it doesn't matter if the data is vertical (skinny) or horizontally (wide) organized.

Update Is a table considered with if the primary key consists of more than one column? Or table will have wide rows only if the partition key is a composite partition key?

Upvotes: 1

Views: 867

Answers (1)

xmas79
xmas79

Reputation: 5180

Wide... Skinny... Terms that make your head explode... I prefer to oversimplify the thing as such:

  1. All the tables have wide rows
  2. You simply need to take care of how wide the rows gets

This allows me to think this as follow (mangling a bit the C* terminology):

        Number of RECORDS in a partition
1 <--------------------------------------- ... 2Billion
      ^                         ^
  Skinny rows                  wide rows

The lesser records in a partition, the skinner is the "partition", and vice-versa.

When designing for C* I always keep in mind a couple of things:

  • I want to use "skinny partitions" when my data can be fetched with one query and it is fully contained in one record of one partition. Typical example is something along SELECT * FROM table WHERE username = 'xmas79'; where the table has a primary key in the form of PRIMARY KEY (username)that let me get all the data belonging to a particular username.
  • I want to use "wide rows" when my data can be fetched with one query and it is fully contained on multiple records of one partition. Typical examples are range queries like SELECT * FROM table WHERE sensor = 'pressure' AND time >= '2016-09-22';, where the table has a primary key in the form of PRIMARY KEY (sensor, time).

So, first approach for one shot queries, second approach for range queries. Beware that this second approach have the (major) drawback that you can keep adding data to the partition, and it will get wider and wider, hurting performances.

In order to control how wide your partitions are, you need to add something to the partition key. In the sensor example above, if your don't violate your requirements of course, you can "group" some measurements by date, eg you split the measures in a day-by-day groups, making the primary key like PRIMARY KEY ((sensor, day), time), where the partition key was transformed to (sensor, day). By this approach, you have full (well, let's say good at least) control on the wideness of your partitions.

You only need to find a good compromise between your query capabilities and the desired performance.

I suggest these three readings for further investigation on the details:

  1. Wide Rows in Cassandra CQL
  2. Does CQL support dynamic columns / wide rows?
  3. CQL3 for Cassandra experts

Beware that in the 1. there's a mistake in the second to last picture: the primary key should be

PRIMARY KEY ((user_id, tweet_id))

with double parenthesis around the columns instead of one.

Upvotes: 7

Related Questions