user1071714
user1071714

Reputation: 51

Cassandra Range Queries

I'm new to Cassandra and trying out data modelling and range queries.

For learning purpose I want to develop a database where I can store log lines with their LogType and Log generation time. Where I have to answer below query:

Find loglines by LogType between date range.

I Model my database as 2 column families: 1) Log

create column family log with comparator = 'UTF8Type' 
and key_validation_class = 'LexicalUUIDType'
and column_metadata=[{column_name: block, validation_class: UTF8Type}];

where I'm planning to store log lines with their logid's

ex: set log['7561a442-24e2-11df-8924-001ff3591711'][blocks]='someText|11-17-2011 23:40:42|sometext';

2)

create column family ltype with column_type = 'Super'
and comparator = 'TimeUUIDType'
and subcomparator = 'UTF8Type'
and column_metadata=[{column_name: id, validation_class: LexicalUUIDType}];

In this column family I will store the log type along with time and the log line id from log column family:

ex: set ltype[ltype1][12307245916538][id]='7561a442-24e2-11df-8924-001ff3591711';

I want to get the results when given type of Log and date range.

Can someone guide me how to run a range query on super column family?

Upvotes: 5

Views: 5264

Answers (1)

zznate
zznate

Reputation: 1908

An article on time series data modelling in Cassandra:

http://rubyscale.com/2011/basic-time-series-with-cassandra/

For time series, you really want to do larger rows - probably in the neighborhood of 10k-50k columns per row as a starting point (depending on your load). You can avoid super columns completely if you make the key a function of the a "date bucket":

[datetime]_[5 second interval] (granularity again depending on load)

This way your keys can be re-created, and you are just issuing a multi_get with the keys for the buckets you want.

A more general overview of data modeling:

http://www.datastax.com/docs/0.8/ddl/index

Upvotes: 6

Related Questions