Reputation: 1882

Tips for monitoring data model with cassandra

I'm relatively new to cassandra and have to evaluate different NoSQL-Solutions for a monitoring tool. One datum is just about 100 Bytes big, but there are really a lot of them. During a day we get about 15 million records... So I'm currently testing with 900 million records (about 15GB as SQL-Insert Script)

My first question is: Does cassandra fit my needs? I need to do range querys (on the date the records were created) and sum up some of the columns according to groups defined by "secondary indexes" stored in the datum.)

I already tried MongoDB but it's really poor MapReduce made a really crappy job... I also read about HBase, but the enormous amount of configuration needed for it makes me hope that there could be solution with Cassandra...

A second question is: how I could store my data to access it in the ways mentioned above? I already thought of a super column family, where the key is the date (as long since 1970) and the columns would be the datums taken at that time... but if I use Random Partitioner, I can't do fast range querys on it (as I know) and if I use Order Preserving Partitioner the data won't be spread over my cluster (currently consisting of two nodes).

I hope I gave you all the necessary information... Thank you for your help!

andy

Upvotes: 2

Answers (2)

jrydberg

Reputation: 629

We had a similar situation.

We store our data in simple rows, where the row key is in the form <id>:<time-bucket>. Our current time bucket size is 24h. The column is the timestamp, and the value is a small object serialized with msgpack.

We do aggregation manually if needed.

We also do a small optimization: when the bucket is full, it becomes immutable, so we create an "all" object holding all values in a single column. Then the per-timestamp columns can be purged. This allows us to fetch a whole bucket and deserialize it in O(1) rather than scanning through the row.

Upvotes: 0

jbellis

Reputation: 19377

Sounds like a job for Brisk (Cassandra + Hadoop distribution). Full Hadoop map/reduce including Hive support, virtually no configuration required.

http://www.datastax.com/products/brisk

Upvotes: 3

Tips for monitoring data model with cassandra

Answers (2)

Related Questions