Reputation: 21
Using MongoDB version 4.4.22 Lets say I have this document:
[
{
_id: ObjectId("6481b9762f11910c77d977ae")
hints : [
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint1"
},
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint2"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint3"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint4"
}
]
},
{
_id: ObjectId("6168b9751000006e9d436bd1")
hints : [
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint1"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint3"
},
]
}
]
Every record has hints array which consist of documents that have a creatorId and a textValue I want to count for each possible combination of hints how many records contain these hints. A valid combination that I want to count is a combination which does not have the same creatorId appearing twice. For the sake of the example lets guess that each record has at least one hint from every one of the possible creators (So partial combinations not need to be supported). Important Note: 2 hints with the same textValue but with different creators are 2 different hints! Is there a way to get the result using aggregation pipeline only? (No code involved)
For the given example, this is the expected result (_id is for a $group
scenario):
[
{
_id: [
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint1"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint3"
},
],
count: 2
},
{
_id: [
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint1"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint4"
}
],
count: 1
},
{
_id: [
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint2"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint3"
},
],
count: 1
},
{
_id: [
{
creatorId : "093c90be-8f19-45de-89ab-20070ef6d3f5",
textValue: "hint2"
},
{
creatorId : "575b9c75-c89d-4505-aec5-b001d8836845",
textValue: "hint4"
}
],
count: 1
}
]
I tried many pipelines but always stuck in a scenario when one records hold more then one hint from the same creator. Glad for help and ideas!
Upvotes: 2
Views: 120
Reputation: 21
Eventually I found a solution (I don't think it's the optimal one but it's working):
I can use $function step to make a JS code which runs on the hints array to make all the combinations I desire:
function (hints) {
if(!hints || hints.length == 0) return [[]]
function getPowerSet(hints) {
const powerSet = [[]]; // Initialize with an empty set
for (const hint of hints) {
const currentSetLength = powerSet.length;
for (let i = 0; i < currentSetLength; i++) {
const subset = [...powerSet[i], hint];
powerSet.push(subset);
}
}
return powerSet;
}
function filterArraysByHintUniqueness(array) {
creatorIds = array.map(l => l.creatorId)
const uniqueValues = new Set(creatorIds);
return array.length === uniqueValues.size;
}
return getPowerSet(hints)
.filter(filterArraysByHintUniqueness)
.filter((array) => array.length == numOfCreators)
.sort((array) => array.sort((a, b) => a.creatorId.localeCompare(b.creatorId)))
}
numOfCreators is known and we can use it dynamically. I used the step like:
{
"$addFields" : {
"possibleCombinations" : {
"$function" : {
"body" : <The JS code mentioned above>,
"args" : ["$hints"],
"lang" : "js"
}
}
}
}
After that a simple stage of $unwind and a simple stage of $group on the possibleCombinations field did the trick.
Upvotes: 0