Reputation: 156
I am new to Cassandra.
We have table structure is like this
CREATE TABLE keyspace.events (
id bigint,
msg_time bigint,
status int,
uuid timeuuid,
message text,
PRIMARY KEY (id, msg_time, status, uuid)
) WITH CLUSTERING ORDER BY (msg_time ASC, status ASC, uuid ASC)
CREATE INDEX timestamp ON hh_keyspace.game_events (msg_time);
We insert data with TTL of 32 days. Analytics team wants only last 1 day of data. Query with msg_time > '' ALLOW FILTERING will have huge performance impact.
Analytics team run the query daily. Are there any other ways to get the data.
Upvotes: 1
Views: 901
Reputation: 156
I realized that the best way is to duplicate the data because writes are cheap in Cassandra. We write to another table with different key structure.
Ref:
1) http://blog.websudos.com/2014/08/16/a-series-on-cassandra-part-1-getting-rid-of-the-sql-mentality/ (Section - 4. Duplicate data and maintain consistency at application level)
2) http://blog.websudos.com/2014/08/23/a-series-on-cassandra-part-2-indexes-and-keys/ (Section - Secondary indexes)
Upvotes: 1
Reputation: 8812
Query with msg_time > '' ALLOW FILTERING will have huge performance impact.
ALLOW FILTERING means SURELY TIMEOUT on a normal production dataset. No surprise here.
Are there any other ways to get the data
Model your table so that the query (give me the last day of data) is very fast. How many events do you expect to have, for the biggest load ever possible, per day ?
Upvotes: 0