Srinivasa T N
Srinivasa T N

Reputation: 63

Number of rows/columns in table

I was experimenting with the Timeseries example in cassandra mentioned at http://planetcassandra.org/getting-started-with-time-series-data-modeling/. Now how can I verify that the example in figure 2 (Partition the row depending on the weather station and date) has created only two rows and each row contains two columns?

Regards, Seenu.

Upvotes: 0

Views: 134

Answers (1)

Jim Meyer
Jim Meyer

Reputation: 9475

You can query each partition key "row" and see how many "columns" are present within it (note that clustered columns in CQL are rows with a common partition key prefix, so the example is really creating what looks like four rows in CQL).

SELECT event_time, temperature FROM temperature_by_day WHERE weatherstation_id='1234ABCD' AND date='2013-04-03';

 event_time               | temperature
--------------------------+-------------
 2013-04-03 07:01:00-0400 |         72F
 2013-04-03 07:02:00-0400 |         73F

SELECT event_time, temperature FROM temperature_by_day WHERE weatherstation_id='1234ABCD' AND date='2013-04-04';

 event_time               | temperature
--------------------------+-------------
 2013-04-04 07:01:00-0400 |         73F
 2013-04-04 07:02:00-0400 |         74F

Or get all the clustered columns at once:

SELECT event_time, temperature FROM temperature_by_day WHERE weatherstation_id='1234ABCD' and DATE in ('2013-04-03', '2013-04-04');

 event_time               | temperature
--------------------------+-------------
 2013-04-03 07:01:00-0400 |         72F
 2013-04-03 07:02:00-0400 |         73F
 2013-04-04 07:01:00-0400 |         73F
 2013-04-04 07:02:00-0400 |         74F

Or just look at the contents of the entire table:

SELECT * from temperature_by_day ;

 weatherstation_id | date       | event_time               | temperature
-------------------+------------+--------------------------+-------------
          1234ABCD | 2013-04-04 | 2013-04-04 07:01:00-0400 |         73F
          1234ABCD | 2013-04-04 | 2013-04-04 07:02:00-0400 |         74F
          1234ABCD | 2013-04-03 | 2013-04-03 07:01:00-0400 |         72F
          1234ABCD | 2013-04-03 | 2013-04-03 07:02:00-0400 |         73F

To see how the data is stored on disk, you can flush the keyspace to disk and then run the sstable2json utility on the data file. This will show that each partition key is stored only once, and the clustering columns are stored in sorted order within the partition key.

root@c1:/var/lib/cassandra/data/tkeyspace/temperature_by_day-e1a74970912211e4aa1ea3121441a41b# sstable2json tkeyspace-temperature_by_day-ka-1-Data.db
[
{"key": "1234ABCD:2013-04-04",
 "cells": [["2013-04-04 07\\:01-0400:","",1420054084914905],
           ["2013-04-04 07\\:01-0400:temperature","73F",1420054084914905],
           ["2013-04-04 07\\:02-0400:","",1420054155058044],
           ["2013-04-04 07\\:02-0400:temperature","74F",1420054155058044]]},
{"key": "1234ABCD:2013-04-03",
 "cells": [["2013-04-03 07\\:01-0400:","",1420054017282283],
           ["2013-04-03 07\\:01-0400:temperature","72F",1420054017282283],
           ["2013-04-03 07\\:02-0400:","",1420054049403031],
           ["2013-04-03 07\\:02-0400:temperature","73F",1420054049403031]]}
]

Upvotes: 1

Related Questions