user509981
user509981

Reputation:

What index to set for a OR query in mongodb

I have my collection version that represent version of an application :

{
    application_id: ObjectId(42),
    version: "1.2",
    price: 1.000000
}

In my database I already have like 400k documents inserted, I have 10k new documents (some might be already in database, other not) and I would like to retrieve all the documents already inserted to get their id for further computation, and then insert only the new document.

** EDIT

To be more precise, this is the algorithm I'm doing :

Hope that's more clear

** END EDIT

So I do a OR request :

{
    "$or":[
        {
            "application_id": 12,
            "version": "1.2"
        },
        {
            "application_id": 13,
            "version": "2.6"
        },
        .... for 10k
    ]
}

When I have no index I get a responses in something like few minutes (5 or 6), so I decided to set an index to improve it :

db.Version.ensureIndex({"application_id":1,"version":1})

I re-index the collection, but computation time is way worst ! I stopped after 10 minutes...

So my question is do you think it's a good idea to make a OR request, or should I split it in 10k simple request, and if a OR request is a good idea what's wrong with my indexes ??

Upvotes: 0

Views: 66

Answers (1)

Sammaye
Sammaye

Reputation: 43884

Man, 10,000 $ors do you really need that many? That must be getting close to the BSON max size.

Your scenario of doing the query is unclear so I cannot recommend a better way currently. However application versioning can normally be done better than this.

As for your question(s):

So my question is do you think it's a good idea to make a OR request

Probably not actually. In this case MongoDB is having to merge like hell and that is probably what's taking so long.

or should I split it in 10k simple request, and if a OR request is a good idea what's wrong with my indexes ??

Your index looks like it should cover so that's not the problem.

If you REALLY have to do this query you might be better off lazy loading the page (I assume website here) and doing the queries in pieces. 10,000 is a lot of queries to send to the server in a very small period of time.

So for example versioning on Github is truncated after about 10 releases and you must click an Ajax link to get another 10 releases.

Upvotes: 1

Related Questions