Reputation: 6203
I’m inserting data into a collection to store user history (about 100 items / second), and querying the last hour of data using the aggregation framework (once a minute)
In order to keep my collection optimal, I'm considering two possible options:
Which would be the more efficient solution? i.e. less demanding on the mongo boxes - in terms of I/O, memory usage, CPU etc. (I currently have 1 primary and 1 secondary, with a few hidden nodes. In case that makes a difference)
(I’m ok with adding a bit of a buffer on my capped collection to store 3-4 hours of data on average, and if users become very busy at certain times not getting the full hour of data)
Upvotes: 7
Views: 2407
Reputation: 1959
As this is quite high on Google this answer should be updated with the official MongoDB statement saying TTL offer better performance:
Generally, TTL (Time To Live) indexes offer better performance and more flexibility than capped collections. TTL indexes expire and remove data from normal collections based on the value of a date-typed field and a TTL value for the index.
Capped collections serialize inserts and therefore have worse concurrent insert performance than non-capped collections. Before you create a capped collection, consider if you can use a TTL index instead.
The most common use case for a capped collection is to store log information. When the capped collection reaches its maximum size, old log entries are automatically overwritten with new entries.
https://www.mongodb.com/docs/manual/core/capped-collections/
Upvotes: 1
Reputation: 69663
Using a capped collection will be more efficient. Capped collections preserve the order of records by not allowing documents to be deleted or to update them in ways to increase their size, so it can always append to the current end of the collection. This makes insertion simpler and more efficient than with a standard collection.
A TTL-index needs to maintain an additional index for the TTL-field which needs to be updated with every insert, which is an additional slowdown on inserts (this point is of course irrelevant when you would also add an index on the timestamp when using a capped collection). Also, the TTL is enforced by a background job which runs at regular intervals and takes up performance. The job is low-priority and MongoDB is allowed to delay it when there are more high-priority tasks to do. That means you can not rely on the TTL being enforced accurately. So when exact accuracy of the time interval matters, you will have to include the time interval in your query even when you have a TTL set.
The big drawback of capped collections is that it is hard to anticipate how large they really need to be. If your application scales up and you receive a lot more or a lot larger documents than anticipated, you will begin to lose data. You should generally only use capped collections for cases where losing older documents prematurely is not that big of a deal.
Upvotes: 13