Vegar
Vegar

Reputation: 12898

Match next and previous to the queried item

The problem:

I have a collection with events of different types. The date for an event might change, and we want to know when it changed and what it changed from, so the current date is stored in an array with all previous events.

[
  { 
     _id: ..., 
     eventDates : [
       { created: ISODate(...), eventDate: ISODate(...) },
       { created: ISODate(...), eventDate: ISODate(...) }
     ],
     eventType: ...,
     otherData: ....
  }
]

So basically, each object has a type and a list of dates, where the last created eventDate is the one that counts.

For each eventType, I want to list the previous and the next event relative to a given date.

An example:

So given the following data:

[ { 
    _id: 1,
    eventType: "sports", 
    eventDates: [ 
      //first scheduled to 1st of February... 
      { created: ISODate(2015-01-01), eventDate: ISODate(2015-02-01) },
      //...but later rescheduled to 10th of February.
      { created: ISODate(2015-01-02), eventDate: ISODate(2015-02-10) }
    ],
    otherData: ...
  },
  { 
    _id: 2,
    eventType: "sports", 
    eventDates: [ 
      //Scheduled to 5st of February... 
      { created: ISODate(2015-01-10), eventDate: ISODate(2015-02-05) }
    ],
    otherData: ...
  },
  { 
    _id: 3,
    eventType: "sports", 
    eventDates: [ 
      //Scheduled to 1st of March... 
      { created: ISODate(2015-01-20), eventDate: ISODate(2015-03-01) }
      //...but later rescheduled to 20th of February.
      { created: ISODate(2015-01-30), eventDate: ISODate(2015-02-20) }
    ],
    otherData: ...
  }
]

I would expect the following output for the date 2015-02-15:

[ {
     eventType: "sports",
     previousEvent: { 
          _id: 1,
          eventType: "sports", 
          eventDates: [ 
              { created: ISODate(2015-01-01), eventDate: ISODate(2015-02-01) },
              { created: ISODate(2015-01-02), eventDate: ISODate(2015-02-10) }
          ],
          otherData: ...
     },
     nextEvent: {
          _id: 3,
          eventType: "sports", 
          eventDates: [ 
             { created: ISODate(2015-01-20), eventDate: ISODate(2015-03-01) }
             { created: ISODate(2015-01-30), eventDate: ISODate(2015-02-20) }
          ],
          otherData: ...
     }
  }

For the date 2015-03-01, I would have the 3rd event as previous, and the next will be null (no more events scheduled). And similar, for the date 2015-02-01, I would have the 2nd event as the next, and the previous would be null (no past events).

My attempt

The first step would be to find the current date for the event. To do that, I'll first $unwind all dates so that I can select the newest later:

db.getCollection("Events").aggregate([
  {$unewind: "$eventDates" }
])

=> { result: [
       { _id: ...., eventDates: { created: ISODate(...), eventDate: ISODate(...) }, otherData: ...},
       { _id: ...., eventDates: { created: ISODate(...), eventDate: ISODate(...) }, otherData: ...},
       { _id: ...., eventDates: { created: ISODate(...), eventDate: ISODate(...) }, otherData: ...}
     ], ok: 1 } 

Next, I'll try to throw away all old eventDates by $group and $max

db.getCollection("Events").aggregate([
  {$unwind: "$eventDates" },
  {$group: { _id: "$_id", eventDate: { $max: "$eventDates.created" }}
])

=> { result: [
     { _id: ...., eventDate: ISODate(...) },
     { _id: ...., eventDate: ISODate(...) }
  ], ok: 1 }

But now, I have only the id of each event together with the last time the date for the event was set. I don't have the event-date itself, and neither the rest of the event data.

How can I tell the $group-step to return everything from the document with the highest @eventDates.created-value?

The second step would be to $project the eventDate into a new 'is-in-the-past'-field. That's an simple one, I guess.

db.getCollection("Events").aggregate([
  {$unwind: "$eventDates" },
  {$project: { event: "$$ROOT", inThePast: {$lt: ["$eventDate.eventDate", new Date()]}}},
  {$group: { _id: "$_id", eventDate: { $max: "$eventDates.created" }}
])

The third step is a new challenge. For each eventType, I want the past event with the highest date, and the future event with the lowest date....

What I want to end up with, is a stucture similar to this one:

 { result: [
    {  eventType: ....,
       previous: { _id: ...., eventDate: ISODate(...), otherData: ...},
       next: { _id: ...., eventDate: ISODate(...), otherDate: ....}
    }
   ], ok: 1 }

Am I on the right track at all with this?

Upvotes: 1

Views: 1108

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50416

Scrubbing the previous response now that we are clear on the objective, which is to get the "next" date occurance of an event from the given date and also the "previous" occurance of an event. These are therefore those events in the grouping that are "nearest" to the queried date in both the "preceeding" and "following" data found.

Here is a minimal data sample, showing two records with the same schema pattern:

[
    {
        "eventDates": [
            { "created": new Date("2015-08-01"), "eventDate": new Date("2015-08-01") },
            { "created": new Date("2015-08-02"), "eventDate": new Date("2015-08-02") },
            { "created": new Date("2015-08-03"), "eventDate": new Date("2015-08-03") },
        ],
        "eventType": "sport",
        "otherData": "something"            
    },
    {
        "eventDates": [
            { "created": new Date("2015-08-04"), "eventDate": new Date("2015-08-04") },
            { "created": new Date("2015-08-05"), "eventDate": new Date("2015-08-05") },
            { "created": new Date("2015-08-06"), "eventDate": new Date("2015-08-06") },
        ],
        "eventType": "sport",
        "otherData": "something"            
    }
]

The objective will be to find the "previous" and "next" event dates without any prior knowledge that these in fact are just single days within the sample. The date we will use is new Date("2015-08-03").

Here is the listing ( heavily commented for explaination ):

var currDate = new Date("2015-08-03");

db.getCollection("Events").aggregate([
    // Unwind the array
    { "$unwind": "$eventDates" },
    // Group and identify "next" and "prev", while pushing array of documents
    { "$group": {
        "_id": "$eventType",
        // All of the events in group with the difference from the date
        "all": {
            "$push": {
                "_id": "$_id",
                "eventDates": "$eventDates",
                "otherData": "$otherData",
                "diff": { "$subtract": [ currDate, "$eventDates.eventDate" ] }
            }
        },
        // The largest negative (smallest) differnce from the date
        "next": {
            "$max": {
                "$cond": [
                    { "$lt": [ currDate, "$eventDates.eventDate" ] },
                    { "$subtract": [ currDate, "$eventDates.eventDate" ] },
                    null       
                ]
            }
        },
        // The smallest positive (smallest) difference from the date
        "prev": {
            "$min": {
                "$cond": [
                    { "$gt": [ currDate, "$eventDates.eventDate" ] },
                    { "$subtract": [ currDate, "$eventDates.eventDate" ] },
                    null       
                ]
            }
        }
    }},
    // Filter array for "next" and "prev" only
    { "$project": {
        "all": {
            // filtering false from the result array
            "$setDifference": [
                // process each array member with conditions
                { "$map": {
                    "input": "$all",
                    "as": "el",
                    "in": {
                        "$cond": [
                            // Am I equal to one of the identified differences?
                            { "$or": [
                                { "$eq": [ "$$el.diff", "$prev" ] },
                                { "$eq": [ "$$el.diff", "$next" ] }
                            ]},
                            // If so then return me
                            {
                                "_id": "$$el._id",
                                "eventDates": "$$el.eventDates",
                                "otherData": "$$el.otherData",
                                "diff": "$$el.diff",
                                // and set my type my which one I matched
                                "type": {
                                    "$cond": [
                                        { "$eq": [ "$$el.diff", "$prev" ] },
                                        "prev",
                                        "next"
                                    ]
                                }
                            },
                            // if not then return false
                            false
                        ]
                    }
                }},
                // removing all false elements from array
                [false]
            ]
        }
    }},
    // Unwind the array
    { "$unwind": "$all" },
    // Group back pulling fields for "next" and "prev"
    { "$group": {
        "_id": "$_id",
        // matching prev element to the field
        "prev": {
            "$min": {
                "$cond": [
                    { "$eq": [ "$all.type", "prev" ] },
                    "$all",
                    null
                ]
            }
        },
        // matching next element to the field
        "next": {
            "$min": {
                "$cond": [
                    { "$eq": [ "$all.type", "next" ] },
                    "$all",
                    null
                ]
            }
        }
    }}
])

Breakdown

  1. The first step is $unwind as of course the idea here is to find items within the array content of the document across different documents. Ideally you want to match a range of dates to minimize the documents processed first.

  2. The next step is to $group where there are a few objectives aside from the general grouping key to be used:

    1. We want all the possible documents in the grouping as you want this data later, so here we $push. We also want to calulate the "difference" in the date values from the queried date to the "eventDate" on the item. This is a $subtract where the result of a date subtracted from another date is the milliseconds between them. This will be used later as well.

    2. You want the largest negative difference ( smallest in minus ) from the current date input. This is determined by filtering those dates that are $lt the date queried and then returning the $max value for the largest one. This is the "next" item difference.

    3. You want the smallest positive difference from the current date input. This has the same filtering with $gt the date queried, and then returning the $min value for the smallest one. This is the "previous" item difference.

  3. With the array of all documents in the group in place, you now need to "filter" those documents to just return the "next" and "previous" matches by the difference values from the last stage.

    The way we will do this is using the $map operator to inspect the array values and determine matches, and the $setDifference operator to filter out any elements for which we return false instead of the document in the match.

    Processing will be done through $cond where it will return either a document or false. The $or condition is to test whether the elements match "prev" or "next" from the prior stage:

    { "$or": [
        { "$eq": [ "$$el.diff", "$prev" ] },
        { "$eq": [ "$$el.diff", "$next" ] }
    ]},
    

    There is another $cond within the document return, and it's job is to "tag" a "type" for "prev" or "next" based on the same basic match condition. This will be used for allocation in a later stage.

  4. The only things left in the array now should be the "prev" and "next" documents from the date we are querying for. In order to get these into a singular form you first $unwind again. Not costly as there are only two documents per group.

  5. The final allocation is under another $group, which has another $cond to either return the matched document to the "type" set earlier or a null value where this did not match. In each case the "document" value is considered lexically to be "smaller" than null, so the accumulator used here is $min. This will return a singular value for each of the "prev" and "next" fields in the response.

Result

The result of course shows the dates for Date("2015-08-02") and Date("2015-08-04") as the previous and next items occuring from the queried date:

{
    "_id" : "sport",
    "prev" : {
        "_id" : ObjectId("55de01a4b64dc3c80673a58d"),
        "eventDates" : {
            "created" : ISODate("2015-08-02T00:00:00Z"),
            "eventDate" : ISODate("2015-08-02T00:00:00Z")
        },
        "otherData" : "something",
        "diff" : NumberLong(86400000),
        "type" : "prev"
    },
    "next" : {
        "_id" : ObjectId("55de01a4b64dc3c80673a58e"),
        "eventDates" : {
            "created" : ISODate("2015-08-04T00:00:00Z"),
            "eventDate" : ISODate("2015-08-04T00:00:00Z")
        },
        "otherData" : "something",
        "diff" : NumberLong(-86400000),
        "type" : "next"
    }
}

Summary

Another take on this would be to run two separate aggregation operations. One to get those "next" and "previous" difference values without needing to push all the documents into an array. Then of course to use those values in a second aggregation operation to filter out the documents that only match those criteria on each grouping boundary and return those as the "next" and "previous" documents.

It could in fact be many queries with several running in parallel. But that is why the logic is explained here in general, so it gives a guide to develop the process to scale to your needs.

Upvotes: 1

Related Questions