Reputation: 880
I have two different documents I would like to group with a matching field (idOrdem
in this case). Both has a timestamp
filed and only one has a timeElapsed
field. I need all of this information to be returned in the aggregation, but the result I'm getting doesn't seem to be correct. I get null
results for the timeElapsed
field where there is definetly a document with this field.
What is wrong with my statement?
db.Logging.aggregate( [
{ $match : { $or : [ { "action":"resetDslVerify"}, { "action":"assia/reset/RequestQueryOs" } ] } },
{ $group : {
_id : "$idOrdem",
timestamp1: { $first: '$timestamp' },
timestamp2: { $last: '$timestamp' },
timeElapsed: { $first: '$timeElapsed' }
}
},
{ $sort: { timestamp: -1} } ]
);
The unexpected result:
{ "_id" : "159251", "timestamp1" : 1483456382058, "timestamp2" : 1483456382058, "timeElapsed" : 1091 }
{ "_id" : "134601", "timestamp1" : 1482949316671, "timestamp2" : 1482949349410, "timeElapsed" : 821 }
{ "_id" : "168801", "timestamp1" : 1483560599899, "timestamp2" : 1483560564505, "timeElapsed" : null }
{ "_id" : "158901", "timestamp1" : 1483452698756, "timestamp2" : 1483452673424, "timeElapsed" : null }
{ "_id" : "135001", "timestamp1" : 1482949653229, "timestamp2" : 1482949711541, "timeElapsed" : 838 }
An example of documents with matching idOrdem
with all the information I need:
s-1:PRIMARY> db.Logging.find( { $or : [ { "action":"resetDslVerify"}, { "action":"assia/reset/RequestQueryOs" } ], "idOrdem":"135001" } );
{ "_id" : ObjectId("586404155b88db1209c3f998"), "success" : true, "action" : "assia/reset/RequestQueryOs", "timestamp" : 1482949653229, "httpCode" : 200, "timeElapsed" : 838, "idOrdem" : "135001", "creator" : "TecnicoVirtual" }
{ "_id" : ObjectId("5864044f5b88db1209c3f99b"), "success" : true, "action" : "resetDslVerify", "timestamp" : 1482949711541, "terminal" : "2134599099", "httpCode" : 200, "idOrdem" : "135001", "idOrdem" : "135001", "result" : "OK", "timestamp" : 1482949711541, "isResetDslSuccess" : true, "creator" : "TecnicoVirtual" }
Upvotes: 0
Views: 207
Reputation: 103475
The $first
and $last
accumulators return a value from the first/last document for each group respectively. Order is only defined if the documents are in a defined order but in your case you are ordering
them AFTER
grouping them in arbitrary order, which renders the accumulators useless as they only return the documents in that undefined order, hence getting the strange results.
To debug the pipeline, run it cumulatively adding each step and inspect the resulting documents at each step. So for example, you could start with just the $match
pipeline and verify if the results at that stage only are the expected:
db.Logging.aggregate([
{
"$match": {
"action": {
"$in": [
"resetDslVerify",
"assia/reset/RequestQueryOs"
]
}
}
}
])
Observe the resulting documents at this stage which should be documents that match the given criteria (shortened to use the $in
operator in your case) and arbitrary ordered.
Add the next pipeline step:
db.Logging.aggregate([
{
"$match": {
"action": {
"$in": [
"resetDslVerify",
"assia/reset/RequestQueryOs"
]
}
}
},
{
"$group": {
"_id": "$idOrdem",
"timestamp1": { "$first": "$timestamp" }
"timestamp2": { "$last": "$timestamp" }
"timeElapsed": { "$first": "$timeElapsed" }
}
}
])
Now this is where things get interesting, the documents from the resulting pipeline are in arbitrary order because they entered the $group
phase in that order. Even placing a final $sort
pipeline step will not remove the fly in the celebration ointment: it won't alter the order of the original documents, just the order of the GROUP
.
The solution is to filter out the nulls in the $match
step, place the $sort
operator before the $group
pipeline, as well as adding the other sort field which will be your group by key and the timeElapsed
attribute:
db.Logging.aggregate([
{
"$match": {
"action": {
"$in": [
"resetDslVerify",
"assia/reset/RequestQueryOs"
]
},
"timestamp": { "$ne": null },
"timeElapsed": { "$ne": null }
}
},
{ "$sort": { "idOrdem": 1, "timestamp": -1, "timeElapsed": -1 } }
{
"$group": {
"_id": "$idOrdem",
"timestamp1": { "$first": "$timestamp" }
"timestamp2": { "$last": "$timestamp" }
"timeElapsed": { "$first": "$timeElapsed" }
}
}
])
A better approach which doesn't use the $sort
pipeline would be to use the $max
and $min
operators:
db.Logging.aggregate([
{
"$match": {
"action": {
"$in": [
"resetDslVerify",
"assia/reset/RequestQueryOs"
]
}
}
},
{
"$group": {
"_id": "$idOrdem",
"timestamp1": { "$max": "$timestamp" }
"timestamp2": { "$min": "$timestamp" }
"timeElapsed": { "$max": "$timeElapsed" }
}
}
])
Upvotes: 1
Reputation: 9295
$first will take the value for the first document even if the field doesn't exist fr this doc. You can fix this issue like this:
db.Logging.aggregate([
{
$match:{
$or:[
{
"action":"resetDslVerify"
},
{
"action":"assia/reset/RequestQueryOs"
}
]
}
},
{
$group:{
_id:"$idOrdem",
timestamp1:{
$first:'$timestamp'
},
timestamp2:{
$last:'$timestamp'
},
timeElapsed:{
$push:"$timeElapsed"
}
}
},
{
$project:{
_id:1,
timestamp1:1,
timestamp2:1,
timeElapsed:{
$arrayElemAt:[
"$timeElapsed",
0
]
}
}
},
{
$sort:{
timestamp:-1
}
}
]);
output: (for the data you provided)
{ "_id" : "135901", "timestamp1" : 1482950884849, "timestamp2" : 1482950907877, "timeElapsed" : 801 }
Upvotes: 1