Reputation: 1392
I have a data set in a csv file that contains dates, category and values. However, the dates might have gaps. E.g.
Date | Category | Value
2016-01-01 Category A 6
2016-01-02 Category A 7
2016-01-03 Category A 4
2016-01-01 Category B 4
2016-01-01 Category C 16
2016-01-02 Category C 8
2016-01-02 Category D 5
I imported the data in a table in PostgreSQL.
I need to calculate a rolling average for the past 7 days for each category (lets simplify it to past 3 days for this example). However, I need to fill the missing dates for each category as 0. What i tried initially was:
Add blank fields with 0, before calculating the average
Select Seven_day.date,
coalesce(data.value,Seven_day.blank_count),
category,
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date,
0 as blank_count
from data t) as Seven_day
left outer join data on data.date=Seven_day.date
However, this does not generate the proper blank fields. Further it is very slow as my data set is quite big.
Is there a better way to tackle the issue? Is it possible to handle this while creating the table itself? e.g. auto generating dates as series with default value to 0? However, how can i deal with the date,category pair is the main question here.
Upvotes: 4
Views: 715
Reputation: 125214
Average for 3 days expandable to any number of days: SQL Fiddle
select *
from (
select
date, value, category,
avg(value) over (
partition by category
order by date
rows between 2 preceding and current row
) as average
from (
select date::date as date, coalesce(value, 0) as value, category
from
t
right join
( -- computed table with all the possible dates x categories
(
select distinct category
from t
) c
cross join
generate_series (
(select min(date) - 2 from t),
(select max(date) from t),
'1 day'
) gs(date)
) s using(category, date)
) s
) s
where date >= (select min(date) from t)
order by date, category
;
date | value | category | average
------------+-------+----------+------------------------
2016-01-01 | 6 | A | 2.0000000000000000
2016-01-01 | 4 | B | 1.3333333333333333
2016-01-01 | 16 | C | 5.3333333333333333
2016-01-01 | 0 | D | 0.00000000000000000000
2016-01-02 | 7 | A | 4.3333333333333333
2016-01-02 | 0 | B | 1.3333333333333333
2016-01-02 | 8 | C | 8.0000000000000000
2016-01-02 | 5 | D | 1.6666666666666667
2016-01-03 | 4 | A | 5.6666666666666667
2016-01-03 | 0 | B | 1.3333333333333333
2016-01-03 | 0 | C | 8.0000000000000000
2016-01-03 | 0 | D | 1.6666666666666667
Upvotes: 4
Reputation: 1392
I found a mini solution:
Select Seven_day.date,
Seven_day.category as cat,
coalesce(test.value,Seven_day.blank_count)
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date, t.category,
0 as blank_count
from test t
order by t.category,date) as Seven_day
left outer join test on test.date=Seven_day.date and test.category=Seven_day.category
order by cat,date
Upvotes: 0