Reputation: 704
I have a flexible mongodb collection for capturing dynamic workflow field and values optimized for searching with different filters
The model is
{
"team_id" : String,
"template_id" : String,
"meta" : [
{
"k" : String,
"v" : String/Number
}
]
}
There are millions of records distributed across various team_id & template_id for each workflow record. A sample record set would be like
[
{
"team_id": "wasp",
"template_id": "order_booking",
"document_id": "d1",
"meta": [
{
"k": "assignee",
"v": "u1"
},
{
"k": "owner",
"v": "u1"
},
{
"k": "total_amount",
"v": 1000
},
{
"k": "pincode",
"v": "560001"
},
{
"k": "state",
"v": "KA"
},
]
},
{
"team_id": "bee",
"template_id": "order_booking",
"document_id": "d900",
"meta": [
{
"k": "assignee",
"v": "a1"
},
{
"k": "owner",
"v": "a1"
},
{
"k": "total_amount",
"v": 19999
},
{
"k": "pincode",
"v": "560001"
},
{
"k": "state",
"v": "KA"
},
]
}
]
I am looking at querying the data from this collection with multiple filters like
Get all records where team is wasp, template is order_booking assignee is u1 with pincode as 10011
The above query translates to
db.getCollection("xyx").aggregate(
[
{
"$match": {
"team_id": "wasp",
"template_id": "order_booking",
"meta": {
"$all": [
{
"$elemMatch": {
"k": "assignee",
"v": "u1"
}
},
{
"$elemMatch": {
"k": "pincode",
"v": "560001"
}
}
]
}
}
},
]
)
The issue with the using the elemMatch in the aggregate expression is it scans all the documents in the collection irrespective the match criteria given initially for the team_id
and template_id
How can I ensure that the initial match criteria is honoured and elemMatch criteria looks only within that subset?
Upvotes: 0
Views: 58
Reputation: 37048
To answer the question literally - you can do 2 matches in a row:
db.getCollection("xyx").aggregate(
[
{
"$match": {
"team_id": "wasp",
"template_id": "order_booking"
}, {
"$match": {
"meta": {
"$all": [
{
"$elemMatch": {
"k": "assignee",
"v": "u1"
}
},
{
"$elemMatch": {
"k": "pincode",
"v": "560001"
}
}
]
}
}
},
]
)
The second $match will test only documents that passed through the first one.
But, if you are optimising performance - add a compound index instead:
{
"team_id": 1,
"template_id": 1
}
It will use in-memory b-tree to filter documents by team_id and template_id and then fetch only matching documents to evaluate "meta".
Upvotes: 1