Reputation: 4662

MongoDb search performance

I want to know why the follow search in mongo db (C#) would take 50 seconds to execute.

I followed the basic idea of http://calv.info/indexing-schemaless-documents-in-mongo/

I have 100,000 records in a collection(captures). On each document I have a SearchTerm Collection

public class SearchTerm
{
    public string Key { get; set; }
    public object Value { get; set; }
}

public class Capture
{
    //Some other fields
    public IList<SearchTerm> SearchTerms { get; set; }
}

I have also defined a index like so

var capturesCollection = database.GetCollection<Capture>("captures");
capturesCollection.CreateIndex("SearchTerms.Key", "SearchTerms.Value");

But the following query takes 50 seconds to execute

var query = Query.Or(Query.And(Query.EQ("SearchTerms.Key", "ClientId"), Query.EQ("SearchTerms.Value", selectedClient.Id)), Query.And(Query.EQ("SearchTerms.Key", "CustomerName"), Query.EQ("SearchTerms.Value", "Jan")));
var selectedCapture = capturesCollection.Find(query).ToList();

Edit: As asked my explain:

clauses: [{ "cursor" : "BtreeCursor SearchTerms.Key_1_SearchTerms.Value_1", "isMultiKey" : true, "n" : 10003, "nscannedObjects" : 100000, "nscanned" : 100000, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0, "indexBounds" : { "SearchTerms.Key" : [["ClientId", "ClientId"]], "SearchTerms.Value" : [[{ "$minElement" : 1 }, { "$maxElement" : 1 }]] } }, { "cursor" : "BtreeCursor SearchTerms.Key_1_SearchTerms.Value_1", "isMultiKey" : true, "n" : 70328, "nscannedObjects" : 90046, "nscanned" : 211653, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0, "indexBounds" : { "SearchTerms.Key" : [["CustomerName", "CustomerName"]], "SearchTerms.Value" : [[{ "$minElement" : 1 }, { "$maxElement" : 1 }]] } }]
cursor: QueryOptimizerCursor
n: 73219
nscannedObjects: 190046
nscanned: 311653
nscannedObjectsAllPlans: 190046
nscannedAllPlans: 311653
scanAndOrder: false
nYields: 2436
nChunkSkips: 0
millis: 5196
server: piro-pc:27017
filterSet: false
stats: { "type" : "KEEP_MUTATIONS", "works" : 311655, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 73219, "needTime" : 238435, "needFetch" : 0, "isEOF" : 1, "children" : [{ "type" : "OR", "works" : 311655, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 73219, "needTime" : 238435, "needFetch" : 0, "isEOF" : 1, "dupsTested" : 80331, "dupsDropped" : 7112, "locsForgotten" : 0, "matchTested_0" : 0, "matchTested_1" : 0, "children" : [{ "type" : "FETCH", "works" : 100001, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 10003, "needTime" : 89997, "needFetch" : 0, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 10003, "children" : [{ "type" : "IXSCAN", "works" : 100000, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 100000, "needTime" : 0, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{ SearchTerms.Key: 1, SearchTerms.Value: 1 }", "boundsVerbose" : "field #0['SearchTerms.Key']: [\"ClientId\", \"ClientId\"], field #1['SearchTerms.Value']: [MinKey, MaxKey]", "isMultiKey" : 1, "yieldMovedCursor" : 0, "dupsTested" : 100000, "dupsDropped" : 0, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 100000, "children" : [] }] }, { "type" : "FETCH", "works" : 211654, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 70328, "needTime" : 141325, "needFetch" : 0, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 70328, "children" : [{ "type" : "IXSCAN", "works" : 211653, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 90046, "needTime" : 121607, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{}", "boundsVerbose" : "field #0['SearchTerms.Key']: [\"CustomerName\", \"CustomerName\"], field #1['SearchTerms.Value']: [MinKey, MaxKey]", "isMultiKey" : 1, "yieldMovedCursor" : 0, "dupsTested" : 211653, "dupsDropped" : 121607, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 211653, "children" : [] }] }] }] }

Upvotes: 0

Answers (2)

wdberkeley

Reputation: 11671

Thanks for posting the explain. Let's address the problems one at a time.

First, I don't think this query does what you think it does / want it to do. Let me show you by example using the mongo shell. Your query, translated into the shell, is

{ "$or" : [
    { "$and" : [
        { "SearchTerms.Key" : "ClientId" }, 
        { "SearchTerms.Value" : "xxx" }
    ]},
    { "$and" : [
        { "SearchTerms.Key" : "CustomerName" },     
        { "SearchTerms.Value" : "Jan" }
    ]}
]}

This query finds documents where either some Key has the value "ClientId" and some Value has the value "xxx" or some Key has the value "CustomerName" and some Value the value "Jan". The key and the value don't need to be part of the same array element. For example, the following document matches your query

{ "SearchTerms" : [
        { "Key" : "ClientId", "Value" : 691 }, 
        { "Key" : "banana", "Value" : "xxx" }
    ]
}

I'm guessing your desired behavior is to match exactly the documents that contain the Key and Value in the same array element. The $elemMatch operator is the tool for the job:

{ "$or" : [
    { "SearchTerms" : { "$elemMatch" : { "Key" : "ClientId", "Value" : "xxx" } } },
    { "SearchTerms" : { "$elemMatch" : { "Key" : "CustomerName", "Value" : "Jan" } } }
]}

Second, I don't think this schema is what you are looking for. You don't describe your use case so I can't be confident, but the situation described in that blog post is a very rare situation where you need to store and search on arbitrary key-value pairs that can change from one document to the next. This is like letting users put in custom metadata. Almost no applications want or need to do this. It looks like your application is storing information about customers, probably for an internal system. You should be able to define a data model for your customers that looks like

{
    "CustomerId" : 1234,
    "CustomerName" : "Jan",
    "ClientId" : "xpj1234",
    ...
}

This will simplify and improve things dramatically. I think the wires got crossed here because sometimes people call MongoDB "schemaless" and the blog post talks about "schemaless" documents. The blog post really is talking about schemaless documents where you don't know what is going to go in there. Most applications should know pretty much exactly what the general structure of the documents in a collection will be.

Finally, I think on the basis of this we can disregard the issue with the slow query for now. Feel free to ask another question or edit this one with extra explanation if you need more help or if the problem doesn't go away once you've taken into account what I've said here.

Upvotes: 1

alernerdev

Reputation: 2064

1) Please take a look at mongodb log file and see whats the query that gets generated against the database. 2) Enter that query into mongo shell and add ".explain()" at the end -- and see if your index is actually being used (does it say Basic Cursor or Btree Cursor ?) 3) If your index is used, whats the value of "nscanned" attribute? Perhaps your index does not have enough "value diversity" in it?

Upvotes: 0

MongoDb search performance

Answers (2)

Related Questions