Any way of prioritizing order Mongod loads things into memory after startup?

Question

My use case will perhaps seem strange on first inspection however I believe in principle what I'm doing is a good way of scaling up massively in a short space of time without any impact on live service.

Our live database is run on a 3 member multi-region replica set on Amazon EC2. We backup regularly by snapshotting the EBS journal and data volumes. It is therefore super easy to spin up standalone clones of our database based on the most recent snapshot. We periodically have some heavy/complex aggregation jobs that require us to do stuff programmatically not possible in aggregation pipeline and need to pull large amounts of data from the database. Have found that pulling the data from active replica set members hampers performance and have therefore been spinning up boxes with standalone mongo servers that contain data from the latest snapshot. This works really nicely though it seems to take around 30 mins before the mongo servers become performant which I guess is due to all indices etc being loaded into memory.

The thing is that I only actually want to access one or two collections from the database. I'm wondering if there is a way to prioritize the collections I wish to use or else drop the collections I don't want without loading them into memory?

John Petrone · Accepted Answer

Part of what you are experiencing is a performance hit for new EBS volumes that have been created from a snapshot. From the EC2 documentation:

When you create a new EBS volume or restore a volume from a snapshot, the back-end storage blocks are allocated to you immediately. However, the first time you access a block of storage, it must be either wiped clean (for new volumes) or instantiated from its snapshot (for restored volumes) before you can access the block. This preliminary action takes time and can cause a 5 to 50 percent loss of IOPS for your volume the first time each block is accessed. For most applications, amortizing this cost over the lifetime of the volume is acceptable. Performance is restored after the data is accessed once.

I can think of 3 ways to address this issue:

The rest of the documentation http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-prewarm.html discusses ways to pre-warm the EBS volume prior to bringing it live (basically by reading all of the blocks on the volume). This may or may not be a useful way to address the problem for you.
You could hit specific database files as suggested by a mongodb blog post http://blog.mongodb.org/post/10407828262/cache-reheating-not-to-be-ignored :

On a server restart, copy datafiles to /dev/null to force reheating to be sequential and thus much faster. This can be done even if the mongod process is already running. If the database is larger than RAM, copy only the newest datafiles (ones with the highest numbers); while this isn’t perfect, the latest files likely contain the largest percentage of frequently used data.
You could use your already pre-warmed EBS volume from a secondary instead of the newly created volume. In this case you would take your secondary down, swap in the newly created volume and take the old volume for use with the new instance. This would take a few minutes but would provide you with a fully warmed up EBS volume probably faster than trying to pre-warm it yourself and performance should be better.

Any way of prioritizing order Mongod loads things into memory after startup?

Answers (2)

Related Questions