Dina
Dina

Reputation: 1406

Weird insertion times into MongoDB

I need to save lots of sensor measurements and I'm doing some benchmarks on MongoDB.

The data: This is the "schema" I'm using:

public class BetterConsolidatedTag
{
    public ObjectId Id { get; set; }

    /// <summary>
    /// The base time to which the offset values relate.
    /// </summary>
    public DateTime BaseTime { get; set; }

    /// <summary>
    /// The name of the data series
    /// </summary>
    public string Name { get; set; }

    /// <summary>
    /// Values of the series in this time frame. The values are saved as offsets in milliseconds from the BaseTime.
    /// </summary>
    [BsonElement]
    private SortedDictionary<int, object> OffsetValues { get; set; }
}

The idea is that instead of saving each measurement by itself I consolidate all the measurements for a specific sensor over an hour. So each document represents all the measurements for a specific sensor over an hour which starts on BaseTime. There are two indexes defined: BaseTime_1_Name_1 and Name_1_BaseTime_1.

The database MongoDB is running on Windows Server 2012 R2 Standard with the following hardware:

The benchmark For simplicity, my simulation generates data at a constant rate - I generate data which represents an hour for all the different sensors and save it to the database. I log the time it takes to save this one-hour data (which as mentioned above, contains the same number of samples each time). The data generation was on my desktop (Windows 7 Enterprise, i7, 8GB RAM, SSD) which sent the data to the MongoDB server over the network using the official MongoDB C# driver. I didn't do anything special with the desktop during the test - mainly internet browsing and plotting the measurements in Excel from time to time. No one but me was connected to the remote server during this time and no one but my benchmark was connected to the MongoDB during the benchmarking.

The results This is a graph depicting insert time (in ms) as a function of the total number of samples in the database (the total size of the DB is about 200GB, saved across 101 files): Sensor data insertion time

And with some zoom: Sensor data insertion time zoomed to lower values I'm having trouble understanding these results. I expected the insertion time to grow slightly over time - there are two indexes and as number of documents increases it's to be expected that maintaining these indexes takes longer. Also, I expected that once data doesn't fit into physical memory anymore and paging begins to occur more frequently the insertion times should become much higher, but the graph doesn't look like there's a single point in time when things get worse. And what I really don't understand is why this graph looks as if it's made up of three different graphs - - One which grows very slowly, and holds almost all the data points (this is what I expected all the results to look like) - One which grows faster and holds less points (maybe these are the times when paging occurs? But it's expect a "jump" in the graph at the point where physical memory got filled up) - One which grows insanely and holds about 40 data points. These data points seem to occur at a constant rate of about every 15 minutes. I thought perhaps these are the times MongoDB creates a new file but the data is saved across 101 files while there are only about 40 of these weird measurements.

Do these results make any sense? If not, what might be the problem? Should I look for mysterious background jobs on the server? A hardware problem?

EDIT: It doesn't make sense that the high points correspond to file creation because it shouldn't take more time to create files as data gets larger. Maybe there's some sort of compaction MongoDB is doing in the background? Something along the lines of most of the time small compactions (the second "graph" and sometimes full compactions (the third and highest "graph"). It would make sense that compaction takes longer as data grows larger. Or maybe garbage collection, which can also be represented by such behavior, I guess?

EDIT 2: Well, MongoDB is written in C++ so I guess GC is out of the question. So right now background compaction is my best guess.

Upvotes: 2

Views: 264

Answers (1)

Maxim Krizhanovsky
Maxim Krizhanovsky

Reputation: 26699

By default MongoDB performs fsync once in a minute, so once in a minute you will have much slower insert - that's the point when data is actually saved to the disk, the rest of the time it's saved only in memory. Set it to sync data every second (or on every write if possible) and the graph will look different.

Upvotes: 2

Related Questions