Reputation: 2291
I am trying to evaluate Cassandra DB performance for storing and retrieving time series data of different channels.
The data is recorded with with maximum record rate of 8 sample/sec in a file format along with a timestamp in millisecond for each sample. The number of channels recording for a given time may vary.
Inspired from the following link Getting Started with Time Series Data Modeling, I used created the following table:
CREATE TABLE uhhdata ( ch_idx int, date timestamp, dt timestamp, val float, PRIMARY KEY ((ch_idx, date), dt) );
where the Partition key is composed of channel number (ch_idx int) and date timestamp which stores the date not and not timestamp detail and dt is the timestamp of record with less than second resolution.
I have two problems: 1-after writing 2,500,000 record into this table and running a query select * from UHHdata limit 10,000,000; I got the following time out error:
Request did not complete within rpc_timeout.
C++ driver simply returns NULL for this number for this number of record: boost::shared_ptr result = future.get().result;
if(!result) std::cout << "No result record\n";
If do this for 100,000, it returns after 22 seconds. How can I retrieve all the records for big queries like this? I have seen a post cassandra get all records in time range, however, I do not how does apply to my case as I need to get all records not some of them?
2-If do a range query on dt timstamp as follows, the returned queries does not check the interval specified by the interval and it is irrespective of lower and upper time limit:
As can be observed, the query returns record bigger than upper time limit '2014-04-04 01:00:10':
cqlsh:uhhkeyspace2> select * from UHHData where ch_idx=1 AND date = '2012-04-04 01:00:00' AND dt < '2014-04-04 01:00:10' LIMIT 20;
ch_idx | date | dt | val
--------+--------------------------------------+--------------------------------------+-----
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:00GMT Daylight Time | -5
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:01GMT Daylight Time | 44
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:02GMT Daylight Time | 83
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:03GMT Daylight Time | 99
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:04GMT Daylight Time | 89
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:05GMT Daylight Time | 55
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:06GMT Daylight Time | 5
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:07GMT Daylight Time | -44
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:08GMT Daylight Time | -83
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:09GMT Daylight Time | -99
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:10GMT Daylight Time | -89
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:11GMT Daylight Time | -55
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:12GMT Daylight Time | -5
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:13GMT Daylight Time | 44
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:14GMT Daylight Time | 83
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:15GMT Daylight Time | 99
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:16GMT Daylight Time | 89
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:17GMT Daylight Time | 55
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:18GMT Daylight Time | 5
1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:19GMT Daylight Time | -44
(20 rows)
Why the timestamp limit conditions are not applied? How Can I fix this?
Thanks, Amin
Upvotes: 1
Views: 1802
Reputation: 3374
I don't see any problems. All your timestamps in dt
column are from 2012-04-04
and your condition is dt < '2014-04-04 01:00:10'
. 2012 is before 2014, so everything is correct
Upvotes: 1