Rico Chen
Rico Chen

Reputation: 2298

How to handle partial week data grouping in mongodb

I have some docs (daily open price for a stock) like the followings:

/* 0 */
{
    "_id" : ObjectId("54d65597daf0910dfa8169b0"),
    "D" : ISODate("2014-12-29T00:00:00.000Z"),
    "O" : 104.98
}

/* 1 */
{
    "_id" : ObjectId("54d65597daf0910dfa8169af"),
    "D" : ISODate("2014-12-30T00:00:00.000Z"),
    "O" : 104.73
}

/* 2 */
{
    "_id" : ObjectId("54d65597daf0910dfa8169ae"),
    "D" : ISODate("2014-12-31T00:00:00.000Z"),
    "O" : 104.51
}

/* 3 */
{
    "_id" : ObjectId("54d65597daf0910dfa8169ad"),
    "D" : ISODate("2015-01-02T00:00:00.000Z"),
    "O" : 103.75
}

/* 4 */
{
    "_id" : ObjectId("54d65597daf0910dfa8169ac"),
    "D" : ISODate("2015-01-05T00:00:00.000Z"),
    "O" : 102.5
}

and I want to aggregate the records by week so I can get the weekly average open price. My first attempt is to use:

db.ohlc.aggregate({
    $match: {
        D: {
            $gte: new ISODate('2014-12-28')
        }
    }
}, {
    $project: {
        year: {
            $year: '$D'
        },
        week: {
            $week: '$D'
        },
        O: 1
    }

}, {
    $group: {
        _id: {
            year: '$year',
            week: '$week'
        },
        O: {
            $avg: '$O'
        }
    }
}, {
    $sort: {
        _id: 1
    }
})

Bu I soon realized the result is incorrect as both the last week of 2014 (week number 52) and the first week of 2015 (week number 0) are partial weeks. With this aggregation I would have an average price for 12/29-12/31/2014 and another one for 01/02/2015 (which is the only trading date in the first week of 2015) but in my application I would need to group the data from 12/29/2015 through 01/02/2015. Any advice?

Upvotes: 0

Views: 662

Answers (3)

flower
flower

Reputation: 11

I use this for candelization; with allowDiskUsage, out and some date filters it works great. Maybe you can adopt the grouping?

db.getCollection('market').aggregate(
[
    { $match: { date: { $exists: true } } },
    { $sort: { date: 1 } },
    { $project: { _id: 0, date: 1, rate: 1, amount: 1, tm15: { $mod: [ "$date", 900 ] } } }, 
    { $project: { _id: 0, date: 1, rate: 1, amount: 1, candleDate: { $subtract: [ "$date", "$tm15" ] } } },
    { $group: { _id: "$candleDate", open: { $first: '$rate' }, low: { $min: '$rate' }, high: { $max: '$rate' }, close: { $last: '$rate' }, volume: { $sum: '$amount' }, trades: { $sum: 1 } } }
])

Upvotes: 1

Santiago Alessandri
Santiago Alessandri

Reputation: 6855

From my experience, this is not a really good approach to tackle the problem. Why? This will definitely not scale, the amount of computation needed is quite exhausting, specially to do the grouping.

What I would do in your situation is to move part of the application logic to the documents in the DB.

My first approach would be to add a "week" field that will state the previous (or next) Sunday of the date the sample belongs to. This is quite easy to do at the moment of insertion. Then you can simply run the aggregation method grouping by that field. If you want more performance, add an index for { symbol : 1, week : 1 } and do a sort in the aggregate.

My second approach, which would be if you plan on making a lot this type of aggregations, is basically having documents that group the samples in a weekly manner. Like this:

{
    week : <Day Representing Week>,
    prices: [
       { Day Sample }, ...
    ]
}

Then you can simply work on those documents directly. This will help you reduce your indexes in a significant manner, thus speeding things up.

Upvotes: 0

Rico Chen
Rico Chen

Reputation: 2298

To answer my own question, the trick is to calculate the number of weeks based on a reference date (1970-01-04) and group by that number. You can check out my new post at http://midnightcodr.github.io/2015/02/07/OHLC-data-grouping-with-mongodb/ for details.

Upvotes: 1

Related Questions