user1682076
user1682076

Reputation:

Match conditions and latest date from array

db.chat.find().pretty().limit(3)
{
    "_id" : ObjectId("593921425ccc8150f35e7662"),
    "user1" : 1,
    "user2" : 2,
    "messages" : [
        {
            "sender" : 1,
            "datetime" : ISODate("2017-06-08T10:04:50Z"),
            "body" : "hiii 0"
        },
        {
            "sender" : 2,
            "datetime" : ISODate("2017-06-09T10:04:50Z"),
            "body" : "hiii 1"
        },
        {
            "sender" : 2,
            "datetime" : ISODate("2017-06-10T10:04:50Z"),
            "body" : "hiii 2"
        }
    ]
}
{
    "_id" : ObjectId("593921425ccc8150f35e7663"),
    "user1" : 1,
    "user2" : 3,
    "messages" : [
        {
            "sender" : 1,
            "datetime" : ISODate("2017-06-08T10:04:50Z"),
            "body" : "hiii 0"
        },
        {
            "sender" : 3,
            "datetime" : ISODate("2017-06-09T10:04:50Z"),
            "body" : "hiii 1"
        },
        {
            "sender" : 1,
            "datetime" : ISODate("2017-06-10T10:04:50Z"),
            "body" : "hiii 2"
        }
    ]
}
{
    "_id" : ObjectId("593921425ccc8150f35e7664"),
    "user1" : 1,
    "user2" : 4,
    "messages" : [
        {
            "sender" : 1,
            "datetime" : ISODate("2017-06-08T10:04:50Z"),
            "body" : "hiii 0"
        },
        {
            "sender" : 1,
            "datetime" : ISODate("2017-06-09T10:04:50Z"),
            "body" : "hiii 1"
        },
        {
            "sender" : 4,
            "datetime" : ISODate("2017-06-10T10:04:50Z"),
            "body" : "hiii 2"
        }
    ]
}

Above mongodb collection store chats between user1 and user2. I am looking for a query that will give me result of the latest messages where message.sender = 1 of each row.

i.e looking for 3 rows as output

For user1=1 and user2, message hiii0 should only come
For user1=1 and user3, message hiii2 should only come
For user1=1 and user4, message hiii1 should only come

ie. just three rows.

db.chat.find({"messages.sender":1}) is giving all the rows while i am looking for just the matched row with the latest datetime. Please help

Example:

db.chat.aggregate([
  {$unwind:"$messages"},
  {$match:{"messages.sender":1}},
  {$sort:{"messages.datetime":-1}
])

is giving output as

{ "_id" : ObjectId("593921425ccc8150f35e7663"), "user1" : 1, "user2" : 3, "messages" : { "sender" : 1, "datetime" : ISODate("2017-06-10T10:04:50Z"), "body" : "hiii 2" } }
{ "_id" : ObjectId("593921425ccc8150f35e7664"), "user1" : 1, "user2" : 4, "messages" : { "sender" : 1, "datetime" : ISODate("2017-06-09T10:04:50Z"), "body" : "hiii 1" } }
{ "_id" : ObjectId("593921425ccc8150f35e7662"), "user1" : 1, "user2" : 2, "messages" : { "sender" : 1, "datetime" : ISODate("2017-06-08T10:04:50Z"), "body" : "hiii 0" } }
{ "_id" : ObjectId("593921425ccc8150f35e7663"), "user1" : 1, "user2" : 3, "messages" : { "sender" : 1, "datetime" : ISODate("2017-06-08T10:04:50Z"), "body" : "hiii 0" } }
{ "_id" : ObjectId("593921425ccc8150f35e7664"), "user1" : 1, "user2" : 4, "messages" : { "sender" : 1, "datetime" : ISODate("2017-06-08T10:04:50Z"), "body" : "hiii 0" } }

Last two rows are not desirable as its its not the latest record for user1-user2 record.

If i am adding ,{$limit:1} , its giving just one row.

Upvotes: 0

Views: 1827

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151072

The basic concept here is you need the aggregation framework in order to apply conditions to "filter" the array elements to the conditions. Depending on the available version there are different techniques that can be applied.

In all cases this is the result:

{
    "_id" : ObjectId("593921425ccc8150f35e7664"),
    "user1" : 1,
    "user2" : 4,
    "messages" : {
            "sender" : 1,
            "datetime" : ISODate("2017-06-09T10:04:50Z"),
            "body" : "hiii 1"
    }
}
{
    "_id" : ObjectId("593921425ccc8150f35e7663"),
    "user1" : 1,
    "user2" : 3,
    "messages" : {
            "sender" : 1,
            "datetime" : ISODate("2017-06-10T10:04:50Z"),
            "body" : "hiii 2"
    }
}
{
    "_id" : ObjectId("593921425ccc8150f35e7662"),
    "user1" : 1,
    "user2" : 2,
    "messages" : {
            "sender" : 1,
            "datetime" : ISODate("2017-06-08T10:04:50Z"),
            "body" : "hiii 0"
    }
}

MongoDB 3.4 and above

db.chat.aggregate([
  { "$match": { "messages.sender": 1 } },
  { "$replaceRoot": {
    "newRoot": {
      "$let": {
        "vars": {
          "messages": {
            "$filter": {
              "input": "$messages",
              "as": "m",
              "cond": { "$eq": [ "$$m.sender", 1 ] }
            }
          },
          "maxDate": {
            "$max": {
              "$map": {
                "input": {
                  "$filter": {
                    "input": "$messages",
                    "as": "m",
                    "cond": { "$eq": [ "$$m.sender", 1 ] }
                  }
                },
                "as": "m",
                "in": "$$m.datetime"
              }
            }
          }
        },
        "in": {
          "_id": "$_id",
          "user1": "$user1",
          "user2": "$user2",
          "messages": {
            "$arrayElemAt": [
              { "$filter": {
                "input": "$$messages",
                "as": "m",
                "cond": { "$eq": [ "$$m.datetime", "$$maxDate" ] }
              }},
              0
            ]
          }    
        }
      }
    }
  }}
])

This is the most efficient way which takes advantage of $replaceRoot which allows us to declare variables to use in the "replaced" structure using $let. The main advantage here is that this requires only "two" pipeline stages.

In order to match the array content you use $filter where you apply the $eq logical operation to test the value of "sender". Where the condition matches, then only the matching array entries are returned.

Using the same $filter so that only the matching "sender" entries are considered, we then want to apply $max over the "filtered" list to the values in "datetime". The $max]5 value is the "latest" date by the conditions.

We want this value so we can later compare the returned results from the "filtered" array to this "maxDate". Which is what happens inside the "in" block of $let where the two "variables" declared earlier for the filtered content and the "maxDate" are again applied to $filter in order to return what should be the only value that met both conditions having the "latest date" as well.

Since you only want "one" result, we use $arrayElemAt to use the value rather than the array.

MongoDB 3.2

db.chat.aggregate([
  { "$match": { "messages.sender": 1 } },
  { "$project": {
    "user1": 1,
    "user2": 1,
    "messages": {
      "$filter": {
        "input": "$messages",
        "as": "m",
        "cond": { "$eq": [ "$$m.sender", 1 ] }
      }
    },
    "maxDate": {
      "$max": {
        "$map": {
          "input": {
            "$filter": {
              "input": "$messages",
              "as": "m",
              "cond": { "$eq": [ "$$m.sender", 1 ] }
            }
          },
          "as": "m",
          "in": "$$m.datetime"
        }
      }
    }         
  }},
  { "$project": {
    "user1": 1,
    "user2": 1,
    "messages": {
      "$arrayElemAt":[
       { "$filter": {
         "input": "$messages",
          "as": "m",
          "cond": { "$eq": [ "$$m.datetime", "$maxDate" ] }
       }},
       0
      ]
    }
  }}
])

This is basically the same process as described, but without the $replaceRoot pipeline stage, we need to apply in two $project stages. The reason for this is we need the "calculated value" from "maxDate" in order to do that final $filter, and it is not available to do in a compound statement, so instead we split the pipelines. This has a small impact on the overall cost of the operation.

In MongoDB 2.6 to 3.0 we can use most of the technique here except for $arrayElemAt and either accept the "array" result with a single entry or put in an $unwind stage to deal with what should now be a single entry.

MongoDB earlier versions

db.chat.aggregate([
  { "$match": { "messages.sender": 1 } },
  { "$unwind": "$messages" },
  { "$match": { "messages.sender": 1 } },
  { "$sort": { "_id": 1, "messages.datetime": -1 } },
  { "$group": {
    "_id": "$_id",
    "user1": { "$first": "$user1" },
    "user2": { "$first": "$user2" },
    "messages": { "$first": "$messages" }
  }}
])

Whilst it looks brief, this is by far the most costly operation. Here you must use $unwind in order to apply the conditions to the array elements. This is a very costly process as it produces a copy of each document for each array entry, and is essentially replaced by the modern operators that avoid this in the case of "filtering".

The second $match stage here discards any elements ( now "documents" ) which did not match the "sender" condition. Then we apply a $sort in order to put the "latest" date on top for each document by the _id, hence the two "sort" keys.

Finally we apply $group in order to just refer to the original document, using $first as the accumulator to get the element that is "on top".

Upvotes: 1

Related Questions