Pedro Daumas
Pedro Daumas

Reputation: 303

Find most recent data per day

Hey guys I have a database that is updated every few hours but I'm having a hard time trying to query the most recent data from each day from a date range.

My database structure looks like this

{
        "_id" : ObjectId("5b96787ebe9d44528eaa18a0"),
        "values" : [{}],
        "user": "user10",
        "date" : "2018-09-09 00:00:00",
        "type" : "patient",
        "extracted_date" : "2018-09-10 10:58:18"
}

I was able to get close to the behaivor that I want by grouping date and getting the last extracted date, like this:

{ "_id" : "2018-09-15 00:00:00", "extracted_date" : "2018-09-19 13:50:22" }
{ "_id" : "2018-09-16 00:00:00", "extracted_date" : "2018-09-19 13:47:26" }
{ "_id" : "2018-09-17 00:00:00", "extracted_date" : "2018-09-19 13:45:00" }
{ "_id" : "2018-09-11 00:00:00", "extracted_date" : "2018-09-12 10:09:17" }
{ "_id" : "2018-09-12 00:00:00", "extracted_date" : "2018-09-14 15:34:59" }
{ "_id" : "2018-09-14 00:00:00", "extracted_date" : "2018-09-19 13:54:34" }
{ "_id" : "2018-09-13 00:00:00", "extracted_date" : "2018-09-14 15:36:10" }
{ "_id" : "2018-09-18 00:00:00", "extracted_date" : "2018-09-19 13:42:23" }

But when I group the data I end up getting all the values from that day and I only need the last one. Here's an example of the query I used:

db.collection.aggregate({'$match': {'type': 'user', 'date': {'$gte': '2018-09-11 00:00:00', '$lte': '2018-09-18 00:00:00'}}}, {'$group': {'_id': {'type': '$type', 'user': '$user', 'date': '$date'}, 'extracted_date': {'$last': '$extracted_date'}, 'values': {'$push': '$values'}}})

If possile I would like to retrive the information as close to the structure utilized by the database.

Thank you very much for your help!

Edit: This is a case example that I need.

database objects:

{
        "_id" : ObjectId("5b96787ebe9d44528eaa18a0"),
        "values" : [{'field1': 1, 'field2': 3}],
        "user": "user10",
        "date" : "2018-09-09 00:00:00",
        "type" : "patient",
        "extracted_date" : "2018-09-10 10:58:18"
}
{
        "_id" : ObjectId("5b96787ebe9d44528eaa18a0"),
        "values" : [{'field1': 1, 'field2': 4}],
        "user": "user10",
        "date" : "2018-09-09 00:00:00",
        "type" : "patient",
        "extracted_date" : "2018-09-11 10:58:18"
}
{
        "_id" : ObjectId("5b96787ebe9d44528eaa18a0"),
        "values" : [{'field11': 2, 'field2': 10}],
        "user": "user11",
        "date" : "2018-09-05 00:00:00",
        "type" : "patient",
        "extracted_date" : "2018-09-10 10:58:18"
}

Expected return:

{
        "_id" : ObjectId("5b96787ebe9d44528eaa18a0"),
        "values" : [{'field1': 1, 'field2': 4}],
        "user": "user10",
        "date" : "2018-09-09 00:00:00",
        "type" : "patient",
        "extracted_date" : "2018-09-11 10:58:18"
}
{
        "_id" : ObjectId("5b96787ebe9d44528eaa18a0"),
        "values" : [{'field11': 2, 'field2': 10}],
        "user": "user11",
        "date" : "2018-09-05 00:00:00",
        "type" : "patient",
        "extracted_date" : "2018-09-10 10:58:18"
}

Since there's 2 objected from the same date it only returns the one with the most recent extracted_date

Upvotes: 1

Views: 87

Answers (2)

Akrion
Akrion

Reputation: 18525

If I understand you correctly you want to just get the last values array and not all of them for that day combined ...

So just do $last for the values like you did for extracted_date.

UPDATE:

Since you are looking for the most recent data for that range you need to use the sort as per matthPen suggestion and just get the needed fields out of the _id and then hide the _id:

db.collection.aggregate([{
    "$match": {
      "type": "patient",
      "date": {
        "$gte": "2018-09-05 00:00:00",
        "$lte": "2018-09-18 00:00:00"
      }
    }
  },
  {
    "$sort": {
      "extracted_date": 1
    }
  },
  {
    "$group": {
      "_id": {
        "type": "$type",
        "user": "$user",
        "date": "$date"
      },
      "id": {
        $last: "$_id"
      },
      "date": {
        $last: "$date"
      },
      "type": {
        $last: "$type"
      },
      "extracted_date": {
        $last: "$extracted_date"
      },
      "values": {
        $last: "$values"
      }
    }
  },
  {
    "$project": {
      "_id": 0
    }
  }
])

You can see it here

Upvotes: 1

matthPen
matthPen

Reputation: 4363

Think you are confusing between 'the last per day' (the one which appened the later in each day) and $last (the last coming from previous stage in pipeline)!

You need to add a sort stage before grouping, to ensure that $last is 'the last'.

db.collection.aggregate({
  "$match": {
    "type": "user",
    "date": {
      "$gte": "2018-09-11 00:00:00",
      "$lte": "2018-09-18 00:00:00"
    }
  }
}, 
{$sort:{
  date:1,
  extracted_date:1
}
},
{
  "$group": {
    "_id": {
      "type": "$type",
      "user": "$user",
      "date": "$date"
    },
    "extracted_date": {
      "$last": "$extracted_date"
    },
    "values": {
      "$last": "$values"
    }
  }
})

Upvotes: 0

Related Questions