Reputation: 935

How to optimize and maintain a MongoDB collection of 1 Billion records?

I have a MongoDB collection which has 1 billion records. Its the last 15 days logs from a telco SMSC node which basically keeps the delivery status of all sms sent through the SMSCs. I am struggling to select the data based on DateTime filter. Its really slow. It is also very slow when i try to delete chunk of the records from this collection to delete the log longer than two weeks. Delete query literally never ends. Saying all this, the pc i am using is pretty much usual desktop Dell PC which has a Core i7 processor with 4GB of RAM. Any suggestion ?

Upvotes: 2

Answers (4)

Hashcut

Reputation: 843

Based on your document structure and index, I highly suspect your index isn't being kept in memory. Three things I'd suggest:

In the mongo shell, type db.collection.stats().indexSizes. That will give you the size in bytes of all of the indexes for that collection. If that number is higher than your RAM (actually, even if it's >2GB, you're probably swapping), then your first step is to add enough RAM to keep your index in memory.

Second, are you sure you need a compound index? That is, are you running a lot of queries in which you use all those fields? Or do you do that just to ensure uniqueness? If all your queries are just on the DEL_TIME field, then having a simple index just on that field will reduce your index space requirement.

Third, have you run your query with the explain() option? You need to do this directly in the mongo shell. That will tell you if your query is actually using the index. Just looking at the query, I think it should be, but you never know until you check.

Upvotes: 1

Milan Saha

Reputation: 935

Here is the sample data from the collection.

Sample data:

{ "_id" : ObjectId("56eacd643f8621ca653d5bf3"), "Node" : "torsmsc11", "MESSAGE_ID" : "1264F954", "CDR_TYPE" : "Initial MO", "SUB_TIME" : "2016-03-17 08:59:50", "DEL_TIME" : "2016-03-17 08:59:50", "OA_ADDR" : NumberLong("16477392921"), "PRE_TRANS_OA" : NumberLong("16477392921"), "DA_ADDR" : NumberLong("16472202975"), "PRE_TRANS_DA" : NumberLong("16472202975"), "ORIG_LOCN" : NumberLong("161350003000"), "ORIG_IDNT" : NumberLong("302490201234882"), "DEST_LOCN" : "UNKNOWN", "DEST_IDNT" : "UNKNOWN", "SEG_NUM" : "1 of 1", "DLV_ATT" : 0, "END_POINT" : "STORAGE", "FINAL_STATE" : "DELIVERED", "CDR_TYPE2" : "MO", "DCS" : 0 }

I have created an unique composite index with these fields:

MESSAGE_ID, CDR_TYPE, SUB_TIME, DEL_TIME, END_POINT

Running this query from SSRS(BI Tool of Microsoft) :

Select TOP 1000 * from mycollection where DEL_TIME between '2016-03-17 08:59:50' and '2016-03-17 09:59:50'

Upvotes: 1

Hashcut

Reputation: 843

Can you post your document structure?

I suspect two things. Firstly, for 1 billion records, I'm thinking you're severely starved of RAM. MongoDB performance falls off a cliff if you don't keep your working set in memory. At the very least, that includes your indexes. But to get good performance, you should also include enough for your working set (i.e. the number of documents your DB routinely accesses, say for example if your queries generally work on a million of those documents at any given time). Even a tiny index (like the primary index of an _id field) for 1 billion records will take far more than 4GB.

Secondly, are you sure you have the appropriate indexes? And that your queries are using your indexes? It seems like you'll want an index on the timestamp field, plus whatever field you might query on (e.g. if your delete query also includes other search terms besides the timestamp).

First step I'd suggest is getting the size of your indexes. You can do this in the mongo shell by typing db.collection.stats().indexSizes. The minimum RAM you need is your index sizes + some amount for your working set.

Once you have enough RAM, next ensure that your queries are using your indexes. You can use Mongo's explain() feature to look at the queryplan for any query, and you can find out if you're actually hitting the indexes or if Mongo is bypassing them and doing a full document search.

If you've defined appropriate indexes, have enough memory to hold your working set (indexes + the set of documents you usually access), and are sure that your queries are using your index, then, it may be appropriate to move to other strategies like sharding. But given your current computer stats (especially the 4GB of RAM), I suspect you'll go a long way with just the first steps listed above.

Upvotes: 3

F.bernal

Reputation: 2684

These are the keys:

Create Indexes to Support Queries

Limit the Number of Query Results to Reduce Network Demand

Use Projections to Return Only Necessary Data

Use $hint to Select a Particular Index

Use the Increment Operator to Perform Operations Server-Side

you can find an explanation in: https://docs.mongodb.org/manual/tutorial/optimize-query-performance-with-indexes-and-projections/

Upvotes: 5

How to optimize and maintain a MongoDB collection of 1 Billion records?

Answers (4)

Related Questions