Reputation: 10015
I want MongoDB to return an entire document if it has exactly one element in array matching the condition. I have written following:
db.myCollection.find({ $where: "this.Tags.filter(x => x.indexOf(':') < 0).length === 1" })
It works fine, except that it's slow as hell, because $where
clause doesn't use indicies.
Could this query be rewritten somehow to normal find
/match
/...
operations that are aware of indicies or this is the only way to perform such an operation? I can defenitly add some field like NumberOfTagsThatDoNotContainSemicolon
, but my question is about more generic approach that doesn't require changes to how data is inserted.
Upvotes: 1
Views: 47
Reputation: 10918
Here's a much more concise version that doesn't need any unwind
s:
db.myCollection.aggregate([
{
$addFields: { // we want to add new field...
"NumberOfTagsWithoutSemiColon": {
$size: { // ...that shall contain the number...
$filter: {
input: "$Tags", // ...of all tags...
cond: {
$eq: // ...that do not contain a semicolon
[
{ $indexOfBytes: [ "$$this", ":" ] },
-1
]
}
}
}
}
}
}, {
$match: {
"NumberOfTagsWithoutSemiColon": 1 // we only keep the ones where
}
}, {
$project: {
"NumberOfTagsWithoutSemiColon": 0
}
}])
Upvotes: 1
Reputation: 10015
After several hours of googling and stackoverflowing i have written following solution:
db.myCollection.aggregate([
{ $match : { "Tags": ":image" } },
{ $unwind : "$Tags" },
{ $match : { "Tags": /^[^:]+$/ } },
{ $group : { _id : "$_id", doc: { "$first": "$$ROOT" }, count: { $sum : 1} }} ,
{ $match : { "count": 1 } },
{ $replaceRoot : {newRoot: "$doc"} },
{ $addFields : { Tags : [ "$Tags" ] } } // we unwinded all tags, so we convert this field back to an array, otherwise we can get type error
])
It works 10x faster than original code : 3 sec vs 31 sec on my machine.
Sample input
{
"_id" : ObjectId("53396223ec8bd02674b1208c"),
"UploadDate" : ISODate("2014-03-31T12:40:03.834Z"),
"Tags" : [
"cars",
" car_diler",
" autodiler",
" auto",
" audi",
":image"
]
},
{
"_id" : ObjectId("53396223ec8bd02674b1208d"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
":image"
]
},
{
"_id" : ObjectId("53396223ec8bd02674b1208e"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"cars",
":image"
]
},
{
"_id" : ObjectId("53396223ec8bd02674b1208f"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"something",
":image",
":somethingelse"
]
},
{
"_id" : ObjectId("53396223ec8bd02674b120ff"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"something",
":somethingelse"
]
}
Current output:
{
"_id" : ObjectId("53396223ec8bd02674b1208e"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"cars"
]
},
{
"_id" : ObjectId("53396223ec8bd02674b1208f"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"something"
]
}
Desired output:
{
"_id" : ObjectId("53396223ec8bd02674b1208e"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"cars",
":image"
]
},
{
"_id" : ObjectId("53396223ec8bd02674b1208f"),
"UploadDate" : ISODate("2014-03-31T12:40:03.835Z"),
"Tags" : [
"something",
":image",
":somethingelse"
]
}
As you see, I loose here all tags starting with :
. It's good enough in my case, but it could be important for someone else. I could collect IDs
first and then query them, but it's crucial to perform all operations in one single query.
Upvotes: 1