Albert
Albert

Reputation: 2664

Optimising a mongoDB query

I have a huge collection of documents, that is ,generally speaking, meant to store millions of documents.

In general a typical document may be quite complex and dynamic, but there are a few constant fields that should be present in each of them. Those fields include: GlobalDeviceStatus, ManualTests, SemiAutomaticTests, AutomaticTests. All the three types of tests are represented by arrays of objects. Each such object may contain quite a few fields but there are again some constant ones. These are componentName and componentTestStatus.

{
    "data": {
        "globalDeviceStatus": false,
        "qaOfficerID": 12121,
        "ManualTests": [{
                "componentName": "camera",
                "componentTestStatus": true,
                "x": 10
            },
            {
                "componentName": "wifi",
                "componentTestStatus": false,
                "mnum": 711
            }
        ],
        "SemiAutomaticTests": [{
                "componentName": "someComponent",
                "componentTestStatus": true,
                "someParameter": true
            },
            {
                "componentName": "oneMoreComponent",
                "componentTestStatus": false
            }
        ],
        "AutomaticTests": [{
                "componentName": "anotherComponent",
                "componentTestStatus": true
            },
            {
                "componentName": "someVeryImportantComponent",
                "componentTestStatus": false
            }
        ]
    },
    "userID": 1
}

Each document represents a test. If the value of GlobalDeviceStatus turns out to be false then the test failed. This also means that its json is expected to contain at least one failed component (tests with GlobalDeviceStatus equal to true on the contrary, do not contain failed components which is quite logical). What I need is to calculate a number of failures for each component, that's as my output I need something like this:

{
    "componentName": 120,
    "someOtherComponentName": 31
}

Every componentName may only belong to one test type. That's to say, if in one document it's in SemiAutomaticTests tests it cannot migrate to AutomaticTests in another one.

To do such calculations I use the following mongo pipe:

COUNT_CRASHES = [
            {
                "$match": {
                    "$or": [{
                        "data.ManualTests.componentTestStatus": false
                    }, {
                        "data.AutomaticTests.componentTestStatus": false
                    }, {
                        "data.SemiAutomaticTests.componentTestStatus": false
                    }]
                }
            }, {
                "$project": {
                    "tests": {
                        "$concatArrays": [{
                            "$filter": {
                                "input": "$data.ManualTests",
                                "as": "mt",
                                "cond": {
                                    "$eq": ["$$mt.componentTestStatus", false]
                                }
                            }
                        }, {
                            "$filter": {
                                "input": "$data.AutomaticTests",
                                "as": "at",
                                "cond": {
                                    "$eq": ["$$at.componentTestStatus", false]
                                }
                            }
                        }, {
                            "$filter": {
                                "input": "$data.SemiAutomaticTests",
                                "as": "st",
                                "cond": {
                                    "$eq": ["$$st.componentTestStatus", false]
                                }
                            }
                        }]
                    }
                }
            }, {
                "$unwind": "$tests"
            }, {
                "$group": {
                    "_id": "$tests.componentName",
                    "count": {
                        "$sum": 1
                    }
                }
            }
        ]

It returns the data in a format different from the one specified above but it's not that important, what really matters right now is that it takes about 7 seconds and sometimes twice as much (~ 14 seconds) to return. That's with 350k documents in the DB.

I would like to reduce the time as much as possible.

Upvotes: 0

Views: 47

Answers (1)

dnickless
dnickless

Reputation: 10918

Unless you restructure your documents into something where "ManualTests","AutomaticTests" and "SemiAutomaticTests" would become field values as opposed to fields themselves (which would likely allow for a leaner pipeline) you would probably need to create three indexes like this to speed up the $match:

db.collection.createIndex({ "data.ManualTests.componentTestStatus": 1 })
db.collection.createIndex({ "data.AutomaticTests.componentTestStatus": 1 })
db.collection.createIndex({ "data.SemiautomaticTests.componentTestStatus": 1 })

Also note that your projection can be shortened into:

"$project": {
    "tests": {
        "$filter": {
            "input": { "$concatArrays": [ "$data.ManualTests", "$data.AutomaticTests", "$data.SemiAutomaticTests" ] },
            "as": "t",
            "cond": {
                "$eq": ["$$t.componentTestStatus", false]
            }
        }
    }
}

Upvotes: 2

Related Questions