Alex Zhukovskiy
Alex Zhukovskiy

Reputation: 10015

Querying MongoDB to return only document with single value with matching condition

I want MongoDB to return an entire document if it has exactly one element in array matching the condition. I have written following:

db.myCollection.find({ $where: "this.Tags.filter(x => x.indexOf(':') < 0).length === 1" })

It works fine, except that it's slow as hell, because $where clause doesn't use indicies.

Could this query be rewritten somehow to normal find/match/... operations that are aware of indicies or this is the only way to perform such an operation? I can defenitly add some field like NumberOfTagsThatDoNotContainSemicolon, but my question is about more generic approach that doesn't require changes to how data is inserted.

Upvotes: 1

Views: 47

Answers (2)

dnickless
dnickless

Reputation: 10918

Here's a much more concise version that doesn't need any unwinds:

db.myCollection.aggregate([
{
    $addFields: { // we want to add new field...
        "NumberOfTagsWithoutSemiColon": {
            $size: { // ...that shall contain the number...
                $filter: {
                    input: "$Tags", // ...of all tags...
                    cond: {
                        $eq: // ...that do not contain a semicolon
                        [
                            { $indexOfBytes: [ "$$this", ":" ] },
                            -1
                        ]
                    }
                }
            }
        }
    }
}, {
    $match: {
        "NumberOfTagsWithoutSemiColon": 1 // we only keep the ones where 
    }
}, {
    $project: {
        "NumberOfTagsWithoutSemiColon": 0
    }
}])

Upvotes: 1

Alex Zhukovskiy
Alex Zhukovskiy

Reputation: 10015

After several hours of googling and stackoverflowing i have written following solution:

db.myCollection.aggregate([
    { $match : { "Tags": ":image" } },
    { $unwind : "$Tags" },
    { $match : { "Tags": /^[^:]+$/ } },
    { $group : { _id : "$_id", doc: { "$first": "$$ROOT" }, count: { $sum : 1} }} ,
    { $match : { "count": 1 } },
    { $replaceRoot : {newRoot: "$doc"} },
    { $addFields : { Tags : [ "$Tags" ] } } // we unwinded all tags, so we convert this field back to an array, otherwise we can get type error
])

It works 10x faster than original code : 3 sec vs 31 sec on my machine.

Sample input

{
    "_id" : ObjectId("53396223ec8bd02674b1208c"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.834Z"),
    "Tags" : [ 
        "cars", 
        " car_diler", 
        " autodiler", 
        " auto", 
        " audi", 
        ":image"
    ]
},
{
    "_id" : ObjectId("53396223ec8bd02674b1208d"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        ":image"
    ]
},
{
    "_id" : ObjectId("53396223ec8bd02674b1208e"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "cars", 
        ":image"
    ]
},
{
    "_id" : ObjectId("53396223ec8bd02674b1208f"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "something",
        ":image",
        ":somethingelse"
    ]
},
{
    "_id" : ObjectId("53396223ec8bd02674b120ff"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "something",
        ":somethingelse"
    ]
}

Current output:

{
    "_id" : ObjectId("53396223ec8bd02674b1208e"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "cars"
    ]
},
{
    "_id" : ObjectId("53396223ec8bd02674b1208f"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "something"
    ]
}

Desired output:

{
    "_id" : ObjectId("53396223ec8bd02674b1208e"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "cars", 
        ":image"
    ]
},
{
    "_id" : ObjectId("53396223ec8bd02674b1208f"),
    "UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
    "Tags" : [ 
        "something",
        ":image",
        ":somethingelse"
    ]
}

As you see, I loose here all tags starting with :. It's good enough in my case, but it could be important for someone else. I could collect IDs first and then query them, but it's crucial to perform all operations in one single query.

Upvotes: 1

Related Questions