CoderBC
CoderBC

Reputation: 1392

Moving window average in postgreSQL

I have a data set in a csv file that contains dates, category and values. However, the dates might have gaps. E.g.

Date       |  Category   | Value
2016-01-01   Category A      6
2016-01-02   Category A      7
2016-01-03   Category A      4
2016-01-01   Category B      4
2016-01-01   Category C      16
2016-01-02   Category C      8
2016-01-02   Category D      5

I imported the data in a table in PostgreSQL.

I need to calculate a rolling average for the past 7 days for each category (lets simplify it to past 3 days for this example). However, I need to fill the missing dates for each category as 0. What i tried initially was:

Add blank fields with 0, before calculating the average

Select Seven_day.date,
        coalesce(data.value,Seven_day.blank_count),
        category, 
    from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date,
            0 as blank_count 
        from data t) as Seven_day
    left outer join data on data.date=Seven_day.date

However, this does not generate the proper blank fields. Further it is very slow as my data set is quite big.

Is there a better way to tackle the issue? Is it possible to handle this while creating the table itself? e.g. auto generating dates as series with default value to 0? However, how can i deal with the date,category pair is the main question here.

Upvotes: 4

Views: 715

Answers (2)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

Average for 3 days expandable to any number of days: SQL Fiddle

select *
from (
    select
        date, value, category,
        avg(value) over (
            partition by category
            order by date
            rows between 2 preceding and current row
        ) as average
    from (
        select date::date as date, coalesce(value, 0) as value, category
        from
            t
            right join
            ( -- computed table with all the possible dates x categories
                (
                    select distinct category
                    from t
                ) c
                cross join
                generate_series (
                    (select min(date) - 2 from t),
                    (select max(date) from t),
                    '1 day'
                ) gs(date)
            ) s using(category, date)
    ) s
) s
where date >= (select min(date) from t)
order by date, category
;
    date    | value | category |        average         
------------+-------+----------+------------------------
 2016-01-01 |     6 | A        |     2.0000000000000000
 2016-01-01 |     4 | B        |     1.3333333333333333
 2016-01-01 |    16 | C        |     5.3333333333333333
 2016-01-01 |     0 | D        | 0.00000000000000000000
 2016-01-02 |     7 | A        |     4.3333333333333333
 2016-01-02 |     0 | B        |     1.3333333333333333
 2016-01-02 |     8 | C        |     8.0000000000000000
 2016-01-02 |     5 | D        |     1.6666666666666667
 2016-01-03 |     4 | A        |     5.6666666666666667
 2016-01-03 |     0 | B        |     1.3333333333333333
 2016-01-03 |     0 | C        |     8.0000000000000000
 2016-01-03 |     0 | D        |     1.6666666666666667

Upvotes: 4

CoderBC
CoderBC

Reputation: 1392

I found a mini solution:

Select Seven_day.date,
Seven_day.category as cat,
    coalesce(test.value,Seven_day.blank_count)
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date, t.category,
        0 as blank_count 
    from test t
     order by t.category,date) as Seven_day
left outer join test on test.date=Seven_day.date and test.category=Seven_day.category
order by cat,date

Upvotes: 0

Related Questions