Shark
Shark

Reputation: 2382

Group Mongo documents by id and get the latest document by timestamp

Imagine we have the following set of documents stored in mongodb:

{ "fooId" : "1", "status" : "A", "timestamp" : ISODate("2016-01-01T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "1", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "1", "status" : "C", "timestamp" : ISODate("2016-01-03T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "2", "status" : "A", "timestamp" : ISODate("2016-01-01T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "2", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "A", "timestamp" : ISODate("2016-01-01T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "C", "timestamp" : ISODate("2016-01-03T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "D", "timestamp" : ISODate("2016-01-04T00:00:00.000Z") "otherInfo" : "BAR", ... }

I'd like to get the latest status for each fooId based on timestamp. Therefore, my return would look like:

{ "fooId" : "1", "status" : "C", "timestamp" : ISODate("2016-01-03T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "2", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "D", "timestamp" : ISODate("2016-01-04T00:00:00.000Z") "otherInfo" : "BAR", ... }

I've been trying to go about this by using aggregation using the group operator, but the part I'm wondering is there an easy way to get the whole document back from an aggregation so it looks the same as if I had used a find query? It seems you have to specify all the fields when you group, and that doesn't seem extensible if documents can have optional fields on them that may be unknown to me. The current query I have looks like this:

db.collectionName.aggregate(
   [
     { $sort: { timestamp: 1 } },
     {
       $group:
         {
           _id: "$fooId",
           timestamp: { $last: "$timestamp" },
           status: { "$last": "$status" },
           otherInfo: { "$last": "$otherInfo" },
         }
     }
   ]
)

Upvotes: 6

Views: 5474

Answers (3)

Sede
Sede

Reputation: 61225

You can use the $$ROOT system variable with the $last operator to return the last document.

db.collectionName.aggregate([      
    { "$sort": { "timestamp": 1 } },     
    { "$group": { 
        "_id": "$fooId",   
        "last_doc": { "$last": "$$ROOT" } 
    }}
])

Of course this will the last document for each group as a value of a field.

{
        "_id" : "2",
        "doc" : {
                "_id" : ObjectId("570e6df92f5bb4fcc8bb177e"),
                "fooId" : "2",
                "status" : "B",
                "timestamp" : ISODate("2016-01-02T00:00:00Z")
        }
}

If you are not happy with that output then your best bet will be to add another $group stage to the pipeline when you simply return an array of those documents using the $push accumulator operator.

db.collectionName.aggregate([      
    { "$sort": { "timestamp": 1 } },     
    { "$group": { 
        "_id": "$fooId",   
        "last_doc": { "$last": "$$ROOT" } 
    }},
    { "$group": { 
        "_id": null, 
        "result": { "$push": "$last_doc" } 
    }}

])

Upvotes: 3

Saleem
Saleem

Reputation: 8978

Though there is no direct way to bring back original documents and I don't see any value, but try following aggregation query:

db.collection.aggregate([
   {$sort: {fooId:1, timestamp: -1}},
   {$group:{_id:"$fooId", doc:{$first:"$$ROOT"}}},
   {$project:{_id:0, doc:["$doc"]}}
]).forEach(function(item){

  printjson(item.doc[0]);

});

This query will emit:

{ 
    "_id" : ObjectId("570e76d5e94e6584078f02c4"), 
    "fooId" : "2", 
    "status" : "B", 
    "timestamp" : ISODate("2016-01-02T00:00:00.000+0000"), 
    "otherInfo" : "BAR"
}
{ 
    "_id" : ObjectId("570e76d5e94e6584078f02c8"), 
    "fooId" : "3", 
    "status" : "D", 
    "timestamp" : ISODate("2016-01-04T00:00:00.000+0000"), 
    "otherInfo" : "BAR"
}
{ 
    "_id" : ObjectId("570e76d5e94e6584078f02c2"), 
    "fooId" : "1", 
    "status" : "C", 
    "timestamp" : ISODate("2016-01-03T00:00:00.000+0000"), 
    "otherInfo" : "BAR"
}

Upvotes: 0

Koitoer
Koitoer

Reputation: 19533

If you are doing and aggregation, you need to do similar to SQL , which mean specify the aggregation operation per column, the only option you have is use the $$ROOT operator

db.test.aggregate(
   [
    { $sort: { timestamp: 1 } },
     {
       $group:
         {
           _id: "$fooId",
           timestamp: { $last: "$$ROOT" }
         }
     }
   ]
);

But that will change the output a little bit

{ "_id" : "1", "timestamp" : { "_id" : ObjectId("570e6be3e81c8b195818e7fa"), 
  "fooId" : "1", "status" : "A", "timestamp" :ISODate("2016-01-01T00:00:00Z"), 
  "otherInfo" : "BAR" } }

If you want to return the original document format, you probably need a $project stage after that

Upvotes: 4

Related Questions