RethinkDB query with group by date

Question

I have the following documents stored:

{
    "date": 1437429603126,
    "id": "7c578fe6-5eeb-466c-a79a-628784fd0d16",
    "quote": {
        "c": "+2.45",
        "c_fix": "2.45",
        "ccol": "chg",
        "cp": "1.89",
        "cp_fix": "1.89",
        "div": "0.52",
        "e": "NASDAQ",
        "ec": "+0.58",
        "ec_fix": "0.58",
        "eccol": "chg",
        "ecp": "0.44",
        "ecp_fix": "0.44",
        "el": "132.65",
        "el_cur": "132.65",
        "el_fix": "132.65",
        "elt": "Jul 20, 5:59PM EDT",
        "id": "22144",
        "l": "132.07",
        "l_cur": "132.07",
        "l_fix": "132.07",
        "lt": "Jul 20, 4:09PM EDT",
        "lt_dts": "2015-07-20T16:09:40Z",
        "ltt": "4:09PM EDT",
        "pcls_fix": "129.62",
        "s": "2",
        "t": "AAPL",
        "yld": "1.57"
    }
}

And looking to run a query that selects fields quote.t, quote.l, quote.c, quote.cp where t is AAPL order by date. The piece that is missing is grouping by multiple documents in the same day. The logic I need is take the oldest document where quote.t = AAPL. So basically there should only be a single document returned each day, and that document should have the greatest date.

Here is what I have so far, missing the grouping of multiple documents in a single day though.

r.db('macd').table('daily_closes').filter({
    'quote': {
        't': 'AAPL'
    }
}).orderBy('date').pluck('date', {
    'quote': [
        't',
        'l',
        'c',
        'cp'
    ]
})

Also, I have secondary indexes, how can I use those in the query?

kureikain · Accepted Answer

You need to group by date, but you store day as epoch time. So you need a way to turn it into day and group. We can then group by that value, and sort the reduction array in desc order, then get the first element of that array with nth.

r.table('daily_closes').filter({
    'quote': {
        't': 'AAPL'
    }
}).orderBy('date')
.pluck('date', {
    'quote': [
        't',
        'l',
        'c',
        'cp'
    ]
}).group(r.epochTime(r.row('date').div(1000)).date()).orderBy(r.desc('date')).nth(0)

You may got something like this:

{
"group": Mon Jul 20 2015 00:00:00 GMT+00:00 ,
"reduction": {
"_date": Mon Jul 20 2015 00:00:00 GMT+00:00 ,
"date": 1437429603126 ,
"quote": {
"c":  "+2.45" ,
"cp":  "1.89" ,
"l":  "132.07" ,
"t":  "AAPL"
}
}
}

So let's reduce noise, we will ungroup it. Basically without ungroup, you are operating on sub stream of each group, when you ungroup, they become a single document. We also only care about data inside reduction, because that contains a single, first document. Here is the final query:

r.table('daily_closes').filter({
    'quote': {
        't': 'AAPL'
    }
}).orderBy('date')
.pluck('date', {
    'quote': [
        't',
        'l',
        'c',
        'cp'
    ]
})
.group(r.epochTime(r.row('date').div(1000)).date()).orderBy(r.desc('date')).nth(0)
.ungroup()
.getField('reduction')

Now, let's use index.

First, filter is slow, and limit to 100k document, order without index is slow. Let's switch to getAll with an index. But we cannot order with an index follow by a getAll. So we will use this trick:

Create an index for both value and using between:

r.table('daily_closes').indexCreate('quote_date', [r.row('quote')('t'),r.row('date')])

Now, we use between:

r.table('daily_closes')
.between(['AAPL', r.minval], ['AAPL', r.maxval],{index: 'quote_date'})
.pluck('date', {
    'quote': [
        't',
        'l',
        'c',
        'cp'
    ]
})
.group(r.epochTime(r.row('date').div(1000)).date())
.orderBy(r.desc('date')).nth(0)
.ungroup()
.getField('reduction')

I hope this helps.

RethinkDB query with group by date

Answers (1)

Related Questions