Benjamin M
Benjamin M

Reputation: 24547

Group first by category, then by week or month

My documents looks like this:

{
    category: "1",
    timestamp: ISODate("2016-07-16T00:00:00.000Z"),
    amount: 0
},
{
    category: "1",
    timestamp: ISODate("2016-08-18T00:00:00.000Z"),
    amount: 15
},
{
    category: "1",
    timestamp: ISODate("2016-08-01T00:00:00.000Z"),
    amount: 5
},
{
    category: "2",
    timestamp: ISODate("2016-08-18T00:00:00.000Z"),
    amount: 10
}

Now I'd like to first group by category (which already works):

{ "$match" : { "timestamp" : { "$gt" : FROM , "$lt" : TO }}},
{ "$sort" : { "timestamp" : 1 }},
{ "$group" : {
    "_id" : "$category",
    "data" : { "$push" : { "timestamp" : "$timestamp" , "amount" : "$amount" }}
}}

And then group those objects within the data array. To get the max amount for each week (or month - depending on user input).

The result then should look something like this (when grouping by month):

{
    _id: "1",
    data: [
        {
            timestamp: "2016-07",    // could also be an ISODate with
            amount: 0                // first (or last) day of month
        },                           // if that makes things easier
        {
            timestamp: "2016-08",
            amount: 15
        }
    ]
},
{
    _id: "2",
    data: [
        {
            timestamp: "2016-08",
            amount: 10
        }
    ]
}

I tried to unwind the data array and then grouping again, but that resulted in a total mess.

Hope you've got some nice idea / solution to get this working.

EDIT: Additional Question:

I've put an index on category which works just fine for $match. Would it also be useful to put an index on timestamp for sorting (because the insertion ordering can differ from the timestamp ordering) or won't this index have any effect within the aggregation?

Upvotes: 4

Views: 609

Answers (2)

Benjamin M
Benjamin M

Reputation: 24547

I've taken Styvane's answer (thanks again!) and simplified it a bit:

{$match: { timestamp: { $gt: FROM , $lt: TO }}},
{$group: {
    _id: {
        id: "$category",
        timestamp: { $concat: [
            { $toLower: { $year:"$timestamp" } },
            "-",
            { $toLower: { $month: "$timestamp" } }
        ] }
    },
    amount: { $max: "$amount" }
}},
{$sort: { "_id.timestamp": 1 } },
{$group: {
    _id: "$_id.id",
    data: { $push: { timestamp: "$_id.timestamp", amount: "$amount" } }
}}

I tried to $sort before the first $group but that did give sometimes unexpected results. Though I just placed the $sort between the $group stages. This way having an index on timestamp doesn't matter anymore.

Upvotes: 3

Sede
Sede

Reputation: 61253

After the $sort stage, you need to $group by "category" then $unwind the "data" field.

var group1 = { "$group": { 
    "_id": "$category", 
    "data": { 
        "$push": { 
            "timestamp": "$timestamp", 
            "amount": "$amount"
        }
    }
}};

var unwind = { "$unwind": "$data"};

From there, you need to re$group your documents, but this time you need to consider not only the timestamp field but the _id field as well and with the help of the $toLower operator you can convert the year and the month value to string which you can concatenate using the $concat operator.

You also return the sum of that group with $sum.

var group2 = { "$group": { 
    "_id": { 
        "id": "$_id", 
        "timestamp": { 
            "$concat": [ 
                { "$toLower": { "$year": "$data.timestamp" } }, 
                "-", 
                { "$toLower": { "$month": "$data.timestamp" } }
            ]
    }}, 
    "amount": { "$sum": "$data.amount" }
}}

Last stage is another $group stage where you simply group the your document by previous _id.id value and use the $push accumulator operator to return the array of data.

var group3 = { "$group": { 
    "_id": "$_id.id", 
    "data": { 
        "$push": { 
            "timestamp": "$_id.timestamp", 
            "amount": "$amount" 
        }
    }
}};

Your final pipeline will look like this:

db.collection.aggregate(
    [
        // $match and `$sort here
        group1,
        unwind,
        group2,
        group3
    ]
)

This query can be improved in the upcoming version of MongoDB using the $facet operator.

db.collection.aggregate([
    // $match and `$sort here
    { "$facet": { "data": [ group1, unwind, group2, group3 ] }
])

Upvotes: 1

Related Questions