Reputation: 97
We create a table “bidresponses”, schema like following. CREATE TABLE yp_rtb_new.bidresponses ( time_id bigint, campaignid int, bidid text, adid int, adsize text, appname text, …
PRIMARY KEY (time_id, campaignid, bidid)
) And set the TTL of this table as 3days. We usually insert 20M records per day. We notice a weird thing. In the first 3 days, we could run “select * from bidresponses limit 10”. After the 3rd—mass delete happened because of TTL, when we ran “select * from bidresponses limit 10” , we got time out error; running “select * from bidresponses where time_id=?”,there is no problem. We tried to force compact, it doesn’t help. After restarting cluster, we could run “select * from bidresposnse limit 10” again. Any idea?
Upvotes: 2
Views: 1521
Reputation: 97
Force compact will work, previously I had thought nodetool compact will effect on all hosts of cluster.So some hosts were not compacted. After I compacted table on every host. It works.
Upvotes: 0
Reputation: 1653
I'm guessing Cassandra had to read through a lot of tombstones (data marked for deletion) to find the data. That, and "SELECT * FROM table;" is a full table/multiple partition scan which will cause timeouts, depending on many factors (tombstones, number of nodes, number of partitions etc).
When you specified 'time_id=?', you told Cassandra exactly what partition you wanted which means fewer/no network hops and seeks to find the data.
I found these articles to be particularly helpful and relevant: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling https://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/
And now that Cassandra has a date based compaction strategy (date tiered compaction strategy) - you can do some smart modeling around deletes using that as well. http://www.datastax.com/dev/blog/datetieredcompactionstrategy
Upvotes: 2