Reputation: 51
I'm new to Cassandra and trying out data modelling and range queries.
For learning purpose I want to develop a database where I can store log lines with their LogType and Log generation time. Where I have to answer below query:
Find loglines by LogType between date range.
I Model my database as 2 column families: 1) Log
create column family log with comparator = 'UTF8Type'
and key_validation_class = 'LexicalUUIDType'
and column_metadata=[{column_name: block, validation_class: UTF8Type}];
where I'm planning to store log lines with their logid's
ex: set log['7561a442-24e2-11df-8924-001ff3591711'][blocks]='someText|11-17-2011 23:40:42|sometext';
2)
create column family ltype with column_type = 'Super'
and comparator = 'TimeUUIDType'
and subcomparator = 'UTF8Type'
and column_metadata=[{column_name: id, validation_class: LexicalUUIDType}];
In this column family I will store the log type along with time and the log line id from log column family:
ex: set ltype[ltype1][12307245916538][id]='7561a442-24e2-11df-8924-001ff3591711';
I want to get the results when given type of Log and date range.
Can someone guide me how to run a range query on super column family?
Upvotes: 5
Views: 5264
Reputation: 1908
An article on time series data modelling in Cassandra:
http://rubyscale.com/2011/basic-time-series-with-cassandra/
For time series, you really want to do larger rows - probably in the neighborhood of 10k-50k columns per row as a starting point (depending on your load). You can avoid super columns completely if you make the key a function of the a "date bucket":
[datetime]_[5 second interval] (granularity again depending on load)
This way your keys can be re-created, and you are just issuing a multi_get with the keys for the buckets you want.
A more general overview of data modeling:
http://www.datastax.com/docs/0.8/ddl/index
Upvotes: 6