Reputation: 2700
The following query helps me to calculate the average of historical values distributed on even time intervals.
EXPLAIN ANALYZE SELECT start_date as date, AVG(hcv1.value::float) as value
FROM generate_series(cast('2017-01-01' as abstime), cast('2017-12-01' as abstime), interval '86400 seconds') start_date
LEFT JOIN history_values hv
ON (
hv.variable_id = 3 AND
hv.created_at BETWEEN start_date AND start_date + interval '86400 seconds'
)
GROUP BY start_date
ORDER BY start_date
Here the report of the query: https://explain.depesz.com/s/q29a
Now if I try to add an extra column value2 pointing to another variable_id the query time goes from 2 seconds to 150 seconds:
EXPLAIN ANALYZE SELECT start_date as date,
AVG(hv1.value::float) as value1,
AVG(hv2.value::float) as value2
FROM generate_series(cast('2017-01-01' as abstime), cast('2017-12-01' as abstime), interval '86400 seconds') start_date
LEFT JOIN history_values hv1
ON (
hv1.variable_id = 2 AND
hv.created_at BETWEEN start_date AND start_date + interval '86400 seconds'
)
LEFT JOIN history_values hv2
ON (
hv2.variable_id = 3 AND
hv.created_at BETWEEN start_date AND start_date + interval '86400 seconds'
)
GROUP BY start_date
ORDER BY start_date
Here is the report: https://explain.depesz.com/s/V1sV
Could anybody tell me why? I was really expecting the time to be around 4 seconds, not almost 75 times more.
Also note that:
SELECT COUNT(*) FROM history_values WHERE variable_id = 2 -- ~25k records
SELECT COUNT(*) FROM history_values WHERE variable_id = 3 -- ~25k records
Upvotes: 1
Views: 80
Reputation: 1
You're not adding an extra column, you're adding another join condition. And you don't need that extra join anyway..
Try instead, just filtering the avg()
EXPLAIN ANALYZE
SELECT start_date as date,
AVG(hv1.value::float) FILTER ( WHERE hv1.variable_id = 1 ) as value1,
AVG(hv2.value::float) FILTER ( WHERE hv1.variable_id = 2 ) as value2
FROM generate_series(
cast('2017-01-01' as abstime)
, cast('2017-12-01' as abstime),
, interval '86400 seconds'
) AS start_date
LEFT JOIN history_values hv1
ON (
hv1.created_at >= cast('2017-01-01' as abstime) AND
hv1.created_at <= cast('2017-12-01' as abstime) AND
hv1.created_at >= start_date AND
hv1.created_at < start_date + interval '86400 seconds'
)
GROUP BY start_date
ORDER BY start_date
As a side note, you should not ever be using abstime
. That should be for internal use only. Instead, I would use
EXPLAIN ANALYZE
SELECT start_date::date AS date,
AVG(hv1.value::float) FILTER ( WHERE hv1.variable_id = 1 ) as value1,
AVG(hv2.value::float) FILTER ( WHERE hv1.variable_id = 2 ) as value2
FROM generate_series(
timestamp with time zone '2017-01-01',
timestamp with time zone '2017-12-01',
interval '1 day'
) AS start_date
LEFT JOIN history_values hv1
ON (
hv1.created_at BETWEEN (
timestamp with time zone '2017-01-01'
AND timestamp with time zone '2017-12-01'
) AND
hv1.created_at >= start_date AND
hv1.created_at < start_date + interval '1 day' AND
hv1.variable_id IN (1,2)
)
GROUP BY start_date
ORDER BY start_date
I would also think you could collapse those ranges down..
EXPLAIN ANALYZE
SELECT start_date::date AS date,
AVG(hv1.value::float) FILTER ( WHERE hv1.variable_id = 1 ) as value1,
AVG(hv2.value::float) FILTER ( WHERE hv1.variable_id = 2 ) as value2
FROM generate_series(
timestamp with time zone '2017-01-01',
timestamp with time zone '2017-12-01' - interval '1 day'
interval '1 day'
) AS start_date
LEFT JOIN history_values hv1
ON hv1.created_at BETWEEN start_date AND (start_date + interval '1 day' )
AND hv1.variable_id IN (1,2)
GROUP BY start_date
ORDER BY start_date
In the future, please ask questions specific to PostgreSQL on http://dba.stackexchange.com. I would flag this for migration there. The admins will gladly move it.
Upvotes: 2