Florian Ludewig
Florian Ludewig

Reputation: 5982

Fill missing dates in records

I have a collection of ProductViews:

{
    productId: "5b8c0f3204a10228b00a1745",
    createdAt: "2018-09-07T17:18:40.759Z"
}

And I have a query for fetching the daily views for a specific product:

ProductView.aggregate([
    { $match: { productId } },
    { $project: { day: { $substr: ["$createdAt", 0, 10] } } },
    {
        $group: {
            _id: "$day",
            count: { $sum: 1 },
            time: { $avg: "$createdAt" },
        }
    },
    { $sort: { _id: 1 } },
    {
        $project: {
            date: '$_id',
            views: '$count',
        },
    },
]).exec((err, result) => ...)

Which currently gives:

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 },
    // ...
]

Issue:

The issue is, that this aggregation does not return { date: '2018-09-03', views: 0 } for days with 0 views. This results in incorrect displaying of the data:

Graph of incorrectly displayed data

Results should look like:

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-03', views: 0 }, // <=
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 },
    // ...
]

P.S.: It would be perfect to pass in the start and end dates to output results based on this range

Upvotes: 11

Views: 5354

Answers (5)

Xavier Guihot
Xavier Guihot

Reputation: 61666

Starting in Mongo 5.1, it's a perfect use case for the new $densify aggregation operator:

// { date: ISODate("2018-09-01"), views: 1 }
// { date: ISODate("2018-09-02"), views: 3 }
// { date: ISODate("2018-09-04"), views: 2 }
// { date: ISODate("2018-09-05"), views: 5 }
db.collection.aggregate([
  { $densify: {
    field: "date",
    range: { step: 1, unit: "day", bounds: "full" }
  }},
  { $set: { views: { $cond: [ { $not: ["$views"] }, 0, "$views" ] } } }
])
// { date: ISODate("2018-09-01"), views: 1 }
// { date: ISODate("2018-09-02"), views: 3 }
// { date: ISODate("2018-09-03"), views: 0 } <=
// { date: ISODate("2018-09-04"), views: 2 }
// { date: ISODate("2018-09-05"), views: 5 }

This:

  • densifies documents ($densify) by creating new documents in a sequence of documents where certain values for a field (in our case field: "date") are missing:
    • the step for our densification is 1 day: range: { step: 1, unit: "day", ... }
    • and we densify within the range of dates defined by our documents: bounds: "full"
  • finally sets ($set) views to 0 only for new documents included during the densify stage ({ views: { $cond: [ { $not: ["$views"] }, 0, "$views" ] })

Of course, to make this work with your specific example, you can switch from dates to strings with $dateToString ($dateToString: { format: "%Y-%m-%d", date: "$date" }) and back to dates with $dateFromString ($dateFromString: { dateString: "$date" })


And concerning your P.S. (pass in the start and end dates to output results based on this range), you can replace bounds: "full" with bounds: [ISODate("2018-08-25"), ISODate("2018-09-07")]

Upvotes: 3

Sede
Sede

Reputation: 61225

I would suggest you add the missing date client side if it's only one or two and the number of document to process is small.

That being said, the following pipeline only works on MongoDB 4.0+ but with a little effort, we can make it work in 3.6.

[
    {
        $group: {
            _id: null,
            dates: {
                $push: {
                    $let: {
                        vars: {
                            date: {
                                $dateToParts: {
                                    date: {
                                        $toDate: "$createdAt"
                                    }
                                }
                            }
                        },
                        in: {
                            $toDouble: {
                                $dateFromParts: {
                                    year: "$$date.year",
                                    month: "$$date.month",
                                    day: "$$date.day"
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    {
        $addFields: {
            startDate: {
                $divide: [
                    {
                        $min: "$dates"
                    },
                    1000
                ]
            },
            endDate: {
                $divide: [
                    {
                        "$max": "$dates"
                    },
                    1000
                ]
            }
        }
    },
    {
        $addFields: {
            dates: {
                $map: {
                    input: {
                        $concatArrays: [
                            "$dates",
                            {
                                $setDifference: [
                                    {
                                        $map: {
                                            input: {
                                                $range: [
                                                    {
                                                        $toDouble: "$startDate"
                                                    },
                                                    {
                                                        $toDouble: "$endDate"
                                                    },
                                                    24*60*60
                                                ]
                                            },
                                            in: {
                                                $multiply: [
                                                    "$$this",
                                                    1000
                                                ]
                                            }
                                        }
                                    },
                                    "$dates"
                                ]
                            }
                        ]
                    },
                    in: {
                        $toDate: "$$this"
                    }
                }
            }
        }
    },
    {
        "$unwind": "$dates"
    },
    {
        "$group": {
            _id: "$dates",
            views: {
                $sum: 1
            }
        }
    },
    {
        "$sort": {
            _id: -1
        }
    }
]

Upvotes: 2

Ashh
Ashh

Reputation: 46441

With some javascript and aggregation trick.

You need to first find the dates between the date range provided.

function getDates(startDate, stopDate) {
  var dateArray = []
  var currentDate = moment(startDate)
  var stopDate = moment(stopDate)
  while (currentDate <= stopDate) {
    dateArray.push(moment(currentDate).format('YYYY-MM-DD'))
    currentDate = moment(currentDate).add(1, 'days')
  }
  return dateArray
}

const dummyArray = getDates('2018-09-01', '2018-09-05')
dummyArray = [ "2018-09-01", "2018-09-02", "2018-09-03", "2018-09-04", "2018-09-05" ]

Now with the below aggregation you can find for the dates which are not available in the database.

db.collection.aggregate([
  { "$match": { productId } },
  { "$group": {
    "_id": { "$substr": ["$createdAt", 0, 10] },
    "count": { "$sum": 1 },
    "time": { "$avg": "$createdAt" },
  }},
  { "$sort": { "_id": 1 } },
  { "$project": { "date": "$_id", "views": "$count" }},
  { "$group": { "_id": null, "data": { "$push": "$$ROOT" }}},
  { "$project": {
    "data": {
      "$map": {
        "input": dummyArray,
        "in": {
          "k": "$$this",
          "v": { "$cond": [{ "$in": ["$$this", "$data.date" ] }, 1, 0 ] }
        }
      }
    }
  }},
  { "$unwind": "$data" },
  { "$group": { "_id": "$data.k", "count": { "$sum": "$data.v" }}}
])

and the output will be

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-03', views: 0 },
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 }
]

Upvotes: 3

s7vr
s7vr

Reputation: 75924

Your question is similar to post from 2014.

All the answers provided at that post are valid and it would be much simpler if you can generate missing days in your application code.

Since you have asked for mongodb solution and lot has changed from 2014 I created a new aggregation pipeline that you can use with 3.6 version.

ProductView.aggregate([
   -- convert the string date into date type for date calcualtions. can avoid this step if you can store the date as date type in collection
    {"$addFields":{"createdAt":{"$dateFromString":{"dateString":"$createdAt"}}}},
      -- strip the time part so we can add whole milliseconds from epoch to calculate next day
    {"$project":{
        "day":{"$dateFromParts":{"year":{"$year":"$createdAt"},"month":{"$month":"$createdAt"},"day":{"$dayOfMonth":"$createdAt"}}}
    }},
      -- generate two sets of data, one that has count by day, other that has unique days, min day and max day
    {"$facet":{
        "daycounts":[{"$group":{"_id":"$day","count":{"$sum":1}}}],
        "maxmindays":[
          {"$group":{
             "_id":null,
             "days":{"$addToSet":"$day"},
             "minday":{"$min":{"$divide":[{"$subtract":["$day",new Date("1-1-1970")]},1000]}},
             "maxday":{"$max":{"$divide":[{"$subtract":["$day",new Date("1-1-1970")]},1000]}}
           }}
        ]
    }},
    {"$project":{
        "data":{
          "$let":{
            "vars":{"maxminday":{"$arrayElemAt":["$maxmindays",0]}},
            "in":{
              -- $range - iterate from min date to max date one day at a time
              "$map":{
                "input":{"$range":["$$maxminday.minday",{"$add": ["$$maxminday.maxday", 60*60*24]},60*60*24]},
                "as":"r",
                "in": {
              -- convert back to milliseconds to get the day
                  "$let":{
                    "vars":{"current":{"$add": [new Date(0), {"$multiply":["$$r", 1000 ]}]}},
                    "in":{
              -- check if the day is in the collection, if yes lookup view inside the daycount facet to get the matching count, else set the view to zero
                      "$cond":[
                        {"$in":["$$current","$$maxminday.days"]},
                        {
                          "date":{"$substr":["$$current",0,10]},
                          "views":{"$let":{"vars":{"daycount":{"$arrayElemAt":["$daycounts",{"$indexOfArray":["$daycounts._id","$$current"]}]}},"in":"$$daycount.count"}}
                        },
                        {"date":{"$substr":["$$current",0,10]},"views":0}
                      ]
                    }
                  }
                }
              }
            }
          }
        }
    }},
    -- flatten the array of data
    {"$unwind":"$data"},
    -- promote the data to top
    {"$replaceRoot":{newRoot:"$data"}}
])

Upvotes: 3

mickl
mickl

Reputation: 49945

You need few additional stages to return default values. First of all you need to use $group with _id set to null to collect all results in one document. Then you can use $map with an array of days as an input. Inside that $map you can use $indexOfArray to find if that date exists in your current result set. If yes (index != -1) then you can return that value, otherwise you need to return default subdocument with views set to 0. Then you can use $unwind to get back a list of documents and $replaceRoot to promote nested stats to a top level.

ProductView.aggregate([
    { $match: { productId: '5b8c0f3204a10228b00a1745' } },
    { $project: { day: { $substr: ["$createdAt", 0, 10] } } },
    {
        $group: {
            _id: "$day",
            count: { $sum: 1 },
            time: { $avg: "$createdAt" },
        }
    },
    { $sort: { _id: 1 } },
    {
        $project: {
            date: '$_id',
            views: '$count',
        },
    },
    {
        $group: {
            _id: null,
            stats: { $push: "$$ROOT" }
        }
    },
    {
        $project: {
            stats: {
                $map: {
                    input: [ "2018-09-01", "2018-09-02", "2018-09-03", "2018-09-04", "2018-09-05" ],
                    as: "date",
                    in: {
                        $let: {
                            vars: { dateIndex: { "$indexOfArray": [ "$stats._id", "$$date" ] } },
                            in: { 
                                $cond: {
                                    if: { $ne: [ "$$dateIndex", -1 ] },
                                    then: { $arrayElemAt: [ "$stats", "$$dateIndex" ] },
                                    else: { _id: "$$date", date: "$$date", views: 0 }
                                } 
                            }
                        }
                    }
                }
            }
        }
    },
    {
        $unwind: "$stats"
    },
    {
        $replaceRoot: {
            newRoot: "$stats"
        }
    }
]).exec((err, result) => ...)

You can generate a static list of dates in your application logic using simple loop. I believe that's possible in MongoDB as well (using $range) but it might complicate this aggregation pipeline. Let me know if you're fine with that or you want to try to generate that array of dates in MongoDB.

Upvotes: 16

Related Questions