Reputation: 509
Can i group each n days in postgres
If we have:
date | price
2018-01-01 | 10
2018-01-02 | 11
2018-01-03 | 10.5
.....
something like grouping each 10 days and get the avg of price column
Upvotes: 1
Views: 426
Reputation: 164
How about this? It buckets consecutive 10 day periods regardless of gaps.
CREATE TABLE date (
date DATE NOT NULL,
price DOUBLE PRECISION NOT NULL
);
INSERT INTO date (date, price)
SELECT (now()::DATE) + s.i,
s.i :: DOUBLE PRECISION
FROM generate_series(0, 1000) AS s(i);
SELECT ((extract(EPOCH FROM date) / (60 * 60 * 24)) :: BIGINT) / 10
, avg(price) AS average_price
FROM date
GROUP BY 1
ORDER BY 1;
Upvotes: 2
Reputation: 16377
This is pretty brute force, and I'll bet a dollar it's wickedly inefficient due to the > and < in the join, but it notionally sounds like what you want to do:
with intervals as (
select start_date, start_date + interval '10 days' as end_date
from generate_series (
(select min (date) from price_data),
(select max (date) from price_data),
interval '10 days') gs (start_date)
)
select
i.start_date, sum (p.price) / 10 as average
from
price_data p
join intervals i on
p.date >= i.start_date and
p.date < i.end_date
group by
i.start_date
This looks a whole lot uglier, but I suspect on large datasets it would run a lot faster:
with intervals as (
select
start_date::date as start_date,
(start_date + interval '10 days')::date as end_date
from generate_series (
(select min (date) from price_data),
(select max (date) from price_data),
interval '10 days') gs (start_date)
),
exploded_intervals as (
select
start_date + i as snapshot_date, start_date, end_date
from
intervals i
cross join generate_series (0, 9) gs (i)
)
select
i.start_date, sum (p.price) / 10 as average
from
price_data p
join exploded_intervals i on
p.date = i.snapshot_date
group by
i.start_date
I don't promise these are the best way to do it, but they are ways to do it.
In a nutshell, I am taking the minimum and maximum dates in your dataset and chunking them into 10-day intervals. This does presuppose that the clock on "every 10 days" starts on your first date.
From there, I am grouping the actual data into each of these date buckets, summing the prices and dividing by 10. If you have missing dates, it should account for that. If you have duplicates on the same day... well, then it's going to artifically inflate your "average." If you define a rule on how to deal with dupes, that is easy enough to manage.
Upvotes: 0
Reputation: 306
example data:
ugautil=> select id,date, price from sales order by 1 limit 30;
id | date | price
-----+------------+--------
569 | 2018-01-01 | 296.01
570 | 2018-01-02 | 409.50
571 | 2018-01-03 | 46.73
572 | 2018-01-04 | 302.13
573 | 2018-01-05 | 313.83
574 | 2018-01-06 | 302.68
575 | 2018-01-07 | 359.53
576 | 2018-01-08 | 348.60
577 | 2018-01-09 | 376.09
578 | 2018-01-10 | 23.71
579 | 2018-01-11 | 470.93
580 | 2018-01-12 | 409.37
581 | 2018-01-13 | 160.95
582 | 2018-01-14 | 22.04
583 | 2018-01-15 | 295.15
584 | 2018-01-16 | 475.42
585 | 2018-01-17 | 399.37
586 | 2018-01-18 | 394.43
587 | 2018-01-19 | 91.97
588 | 2018-01-20 | 27.38
589 | 2018-01-21 | 286.23
590 | 2018-01-22 | 57.81
591 | 2018-01-23 | 486.14
592 | 2018-01-24 | 10.30
593 | 2018-01-25 | 423.67
594 | 2018-01-26 | 169.94
595 | 2018-01-27 | 152.08
596 | 2018-01-28 | 344.42
597 | 2018-01-29 | 448.63
598 | 2018-01-30 | 360.33
(30 rows)
Picking Jan 1, 2018 as start date. Every 10 days gives us an index number.
only looking at first 3 groups in Jan
ugautil=> select floor((extract(epoch from date) - extract(epoch from date('2018-01-01')))/86400/10) as "ten_day_index", round(avg(price),2) from sales group by 1 order by 1 limit 3;
ten_day_index | round
---------------+--------
0 | 277.88
1 | 274.70
2 | 273.96
(3 rows)
ugautil=> delete from sales where id >= 569 and id <= 576;
DELETE 8
ugautil=> select id,date, price from sales order by 1 limit 30;
id | date | price
-----+------------+--------
577 | 2018-01-09 | 376.09
578 | 2018-01-10 | 23.71
579 | 2018-01-11 | 470.93
580 | 2018-01-12 | 409.37
581 | 2018-01-13 | 160.95
582 | 2018-01-14 | 22.04
583 | 2018-01-15 | 295.15
584 | 2018-01-16 | 475.42
585 | 2018-01-17 | 399.37
586 | 2018-01-18 | 394.43
587 | 2018-01-19 | 91.97
588 | 2018-01-20 | 27.38
589 | 2018-01-21 | 286.23
590 | 2018-01-22 | 57.81
591 | 2018-01-23 | 486.14
592 | 2018-01-24 | 10.30
593 | 2018-01-25 | 423.67
594 | 2018-01-26 | 169.94
595 | 2018-01-27 | 152.08
596 | 2018-01-28 | 344.42
597 | 2018-01-29 | 448.63
598 | 2018-01-30 | 360.33
599 | 2018-01-31 | 120.00
600 | 2018-02-01 | 328.08
601 | 2018-02-02 | 393.58
602 | 2018-02-03 | 52.04
603 | 2018-02-04 | 206.91
604 | 2018-02-05 | 194.20
605 | 2018-02-06 | 102.89
606 | 2018-02-07 | 146.78
(30 rows)
ugautil=> select floor((extract(epoch from date) - extract(epoch from date('2018-01-01')))/86400/10) as "ten_day_index", round(avg(price),2) from sales group by 1 order by 1 limit 3;
ten_day_index | round
---------------+--------
0 | 199.90
1 | 274.70
2 | 273.96
(3 rows)
only the Jan 9 and 10 entries are in the first average in group 0
Upvotes: 0