Reputation: 2713
In MongoDB, how can I return all documents that:
E.g., given the below collection, I want to match all the documents that:
channelId
dayOfWeek
is equal to MondayBetween all documents that satisfy 1 & 2, they should also collectively match product Ids [1, 2]. I.e., if I have a channelId
that matches dayOfWeek
, it also must have a single document that has productId
of 1 and another document with productId
of 2.
{
channelId: "ID-A",
dayOfWeek: "MONDAY",
productId: "1"
},
{
channelId: "ID-A",
dayOfWeek: "MONDAY",
productId: "2"
},
{
channelId: "ID-B",
dayOfWeek: "MONDAY",
productId: "1"
},
{
channelId: "ID-B",
dayOfWeek: "MONDAY",
productId: "3"
},
{
channelId: "ID-C",
dayOfWeek: "MONDAY",
productId: "1"
},
{
channelId: "ID-C",
dayOfWeek: "TUESDAY",
productId: "2"
}
The desired return in this case would be:
{
channelId: "ID-A",
dayOfWeek: "MONDAY",
productId: "1"
},
{
channelId: "ID-A",
dayOfWeek: "MONDAY",
productId: "2"
}
productId
equal to 2. dayOfWeek
equal to Monday.Upvotes: 2
Views: 408
Reputation: 151122
Basically you need to $group
everything together by the common key after filtering on the common condition for dayOfWeek
and then looking to see if the grouping does indeed "pair" and actually contains the two results you want to "intersect":
db.collection.aggregate([
{ "$match": { "dayOfWeek": "MONDAY" } },
{ "$group": {
"_id": "$channelId",
"docs": { "$push": "$$ROOT" },
"count": { "$sum": 1 }
}},
{ "$match": {
"count": { "$gt": 1 },
"docs": {
"$all": [
{ "$elemMatch": { "productId": "1" } },
{ "$elemMatch": { "productId": "2" } }
]
}
}}
])
Once "grouped" the $all
with $elemMatch
conditions inside it ensure that "both" conditions are met within the "grouped" documents now inside the array. Also note that if you actually mean "only two" then you can look for "count": 2
as opposed to simply "count": { "$gt": 1 }
which in example here means the "grouping" at least paired with "something", if not exactly "two".
That basically returns a result with the matching documents per grouping as:
{
"_id" : "ID-A",
"docs" : [
{
"_id" : ObjectId("5b22f455fe0315289f716483"),
"channelId" : "ID-A",
"dayOfWeek" : "MONDAY",
"productId" : "1"
},
{
"_id" : ObjectId("5b22f455fe0315289f716484"),
"channelId" : "ID-A",
"dayOfWeek" : "MONDAY",
"productId" : "2"
}
],
"count" : 2
}
If you need the "documents only" as result, then you can take that further after $unwind
with $replaceRoot
if you have a supporting MongoDB version past 3.4:
db.collection.aggregate([
{ "$match": { "dayOfWeek": "MONDAY" } },
{ "$group": {
"_id": "$channelId",
"docs": { "$push": "$$ROOT" },
"count": { "$sum": 1 }
}},
{ "$match": {
"count": { "$gt": 1 },
"docs": {
"$all": [
{ "$elemMatch": { "productId": "1" } },
{ "$elemMatch": { "productId": "2" } }
]
}
}},
{ "$unwind": "$docs" },
{ "$replaceRoot": { "newRoot": "$docs" } }
])
Or with $project
and explicitly naming all fields where you don't:
db.collection.aggregate([
{ "$match": { "dayOfWeek": "MONDAY" } },
{ "$group": {
"_id": "$channelId",
"docs": {
"$push": {
"_id": "$_id",
"channelId": "$channelId",
"dayOfWeek": "$dayOfWeek",
"productId": "$productId
}
},
"count": { "$sum": 1 }
}},
{ "$match": {
"count": { "$gt": 1 },
"docs": {
"$all": [
{ "$elemMatch": { "productId": "1" } },
{ "$elemMatch": { "productId": "2" } }
]
}
}},
{ "$unwind": "$docs" },
{ "$project": {
"_id": "$docs._id",
"channelId": "$docs.channelId",
"dayOfWeek": "$docs.dayOfWeek",
"productId": "$docs.productId"
}}
])
Actually in that last form the statement is basically compatible with every release of MongoDB since the aggregation framework was released with version 2.2.
Alternately you can actually "use" the $setIsSubset
operator as long as you have a MongoDB 3.6 or greater version with $expr
:
db.collection.aggregate([
{ "$match": { "dayOfWeek": "MONDAY" } },
{ "$group": {
"_id": "$channelId",
"docs": { "$push": "$$ROOT" },
"count": { "$sum": 1 }
}},
{ "$match": {
"count": { "$gt": 0 },
"$expr": {
"$setIsSubset": [ [ "1", "2" ], "$docs.productId" ]
}
}}
])
And you can even vary that with $redact
or using $project
followed by another $match
, but it's really not the point as though you likely thought "set operators" these are not actually best suited to the particular results you are looking for here.
Note though that any kind of "intersection" or "subset" basically relies on being able to compare the documents against each other. This essentially means putting the "grouped" things into an array for such a comparison. If the actual results size causes such a "grouping" to exceed the BSON limit, them you cannot actually use such and approach and really have no other option than to load the matching documents to the initial query filter through a cursor and examine.
For "completeness" therefore you can possibly consider in that case where you have $lookup
available to use a "self referencing join" as opposed to using $push
to accumulate the matching documents:
db.collection.aggregate([
{ "$match": { "dayOfWeek": "MONDAY" } },
{ "$group": {
"_id": "$channelId",
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } }, // keep only "multiple" groups
{ "$lookup": {
"from": "collection",
"localField": "_id",
"foreignField": "channelId",
"as": "docs"
}},
{ "$unwind": "$docs" },
// ** See note below about the $match **
//{ "$match": { "docs.productId": { "$in": [ "1", "2" ] } } },
])
The advantage here is the "array" of "docs"
never actually gets constructed as per the special handling of $lookup
+ $unwind
Coalescence which essentially "rolls up" the unwinding
action to be within the $lookup
itself. In this way you get the same documents was would be accumulated via the $push
operation, but already "separated" into their own documents in a way which does not break the 16MB BSON limit.
The limitation remains however that the "set" cannot actually be compared whilst in this form as you need the "array" in order to see if "grouped" items are within the "set". So it's actually that "grouping" action which is being avoided to avoid the limit breach. Nonetheless this is generally better than simply iterating the cursor of matching documents to "MONDAY"
alone since you already have an indication of the "grouped" results by "channelId"
.
The only other comparison that can be done there is using an additional $match
using $in
. This again would be "rolled up" into the actual $lookup
operation to efficiently only return those documents which also matched that condition. The result however essentially "negates" as it's really just the same results that can be achieved using $in
with the initial query, and of course means "only" those documents containing either "1"
or "2"
and not any others with respect to those values being "part of the subset" instead.
Upvotes: 1