ShadowEagle
ShadowEagle

Reputation: 520

Aggregate a collection of timestamps in MongoDB using the Aggregation Pipeline

I have a collection of timestamps which record what actions are performed by users at which time. For now, the collection consists of only two actions start and end. There can only be a single end action, while there can be multiple start actions per user.

Now I want a generate a list of users where the time difference between the last start action and the end action is - for example - less than a minute.

The simplified documents in my collection timestamps look like this:

document #1

{
  id: 123,
  user: "user1",
  type: "start",
  date: 2019-09-10
}

document #2

{
  id: 234,
  user: "user1",
  type: "end",
   date: 2019-09-11
}

Now the result I want should look like this:

{
  id: null,
  list: ["user1, user2"]
}

The field list should contain every user, where the time difference between the start and end action is less than a minute.

I am having trouble combining the documents which contain the start and end attribute. I was trying to combine them into documents that looks like this:

{
  id: 345
  user: "user1"
  date_start: 2019-09-10
  date_end: 2019-09-11
}

I don't know where to start with the aggregation pipeline and how to split and combine the different types of timestamps. Furthermore, I still need to add a field that contains the difference between both dates.

Upvotes: 4

Views: 635

Answers (1)

Himanshu Sharma
Himanshu Sharma

Reputation: 3010

The following query can get us the expected output:

db.collection.aggregate([
    {
        $sort:{
            "date":-1
        }
    },
    {
        $group:{
            "_id":{
                "id":"$id",
                "type":"$type"
            },
            "id":{
                $first:"$id"
            },
            "user":{
                $first:"$user"
            },
            "type":{
                $first:"$type"
            },
            "date":{
                $first:"$date"
            }
        }
    },
    {
        $group:{
            "_id":"$id",
            "user":{
                $first:"$user"
            },
            "info":{
                $push:{
                    "k":"$type",
                    "v":"$date"
                }
            }
        }
    },
    {
        $addFields:{
            "info":{
                $arrayToObject:"$info"
            }
        }
    },
    {
        $match:{
            $expr:{
                $lt:[
                    {
                        $subtract:[
                            {
                                $toDate:"$info.end"
                            },
                            {
                                $toDate:"$info.start"
                            }
                        ]
                    },
                    60000
                ]
            }
        }
    },
    {
        $group:{
            "_id":null,
            "users":{
                $push:"$user"
            }
        }
    },
    {
        $project:{
            "_id":0
        }
    }
]).pretty()

Data set:

{
    "_id" : ObjectId("5d77a117bd4e75c58d598214"),
    "id" : 123,
    "user" : "user1",
    "type" : "start",
    "date" : "2019-09-10T13:01:14.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d598215"),
    "id" : 123,
    "user" : "user1",
    "type" : "start",
    "date" : "2019-09-10T13:04:14.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d598216"),
    "id" : 123,
    "user" : "user1",
    "type" : "start",
    "date" : "2019-09-10T13:09:02.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d598217"),
    "id" : 123,
    "user" : "user1",
    "type" : "end",
    "date" : "2019-09-10T13:09:14.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d598218"),
    "id" : 234,
    "user" : "user2",
    "type" : "start",
    "date" : "2019-09-10T13:02:02.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d598219"),
    "id" : 234,
    "user" : "user2",
    "type" : "end",
    "date" : "2019-09-10T13:09:14.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d59821a"),
    "id" : 345,
    "user" : "user3",
    "type" : "start",
    "date" : "2019-09-10T13:08:55.242Z"
}
{
    "_id" : ObjectId("5d77a117bd4e75c58d59821b"),
    "id" : 345,
    "user" : "user3",
    "type" : "end",
    "date" : "2019-09-10T13:09:14.242Z"
}

Output:

{ "users" : [ "user3", "user1" ] }

Query analysis:

  • Stage I: Sorting the documents in descending order of the date
  • Stage II: Grouping on [id, type] and picking the first date for each type i.e. the latest date for each type
  • Stage III: Grouping only on id and pushing the type and associated date into an array as key-value pairs
  • Stage IV: Converting the array of key-value pairs into an object
  • Stage V: Filtering documents which has the difference between end and start date less than 60000 ms. (milliseconds equivalent of 1 minute)
  • Stage VI: Pushing all filtered names into an array

Upvotes: 1

Related Questions