Reputation: 479

Cassandra data aggregation and rollup

which is the best way to aggregate and store back data in a Cassandra cluster? I mean, having a table with hour data, aggregate at day and save in a different table. This can be simply achieved with select and insert for every key/period, but is there a better or different way? What about materialized views?

Upvotes: 0

Answers (1)

barth

Reputation: 441

Materialized views

Usage of materialized views in cassandra is quite limited :

all primary keys from the source table must appear in the view, possibly in a different order.
aggregate functions like avg cannot be used
GROUP BY is not allowed

So I do not think it is suitable for your time-based rollup, nor any other aggregations.

By the way, materialized view has been retroactively classified as experimental, and not recommended for new production uses.

Manual solution

This is great as soon as the data to aggregate is frozen, forever... If not, consistency will be hard to handle.

Indexes

A completely different approach to the rollup would be to use Elassandra to index the temporal column. An elasticsearch secondary index we'll be created and keep in sync automatically. Then use the embed elasticsearch API to query at different time scales, using date histogram aggregation.

This way the result of aggregations is not stored, but calculated in real-time from a efficient secondary data structure.