Reputation: 40810
I have the following table with PostgreSQL+TimescaleDB:
create table heatingElectricityMeter(
time timestamptz(0) not null,
import real not null
);
select create_hypertable('heatingElectricityMeter', 'time', chunk_time_interval => interval '1 month');
"import" is the total energy consumed since the energy meter was produced. Thus "import" is monotonically increasing. About once every 10seconds I insert a value to that table (however currently I have much less then one value for every 10s in the DB).
Now I would like to create a bar-chart that shows energy consumed per local day for the last X days. Let's say... The last 90 days.
Here is my first attempt:
select
time_bucket(interval '1 day', time::timestamp) "day",
first(import, "time"),
last(import, "time")
from heatingelectricitymeter
where time between '2021-07-28T02:00:00Z' and '2021-10-28T02:00:00Z'
group by "day"
order by 1
This query takes roughly 2.5seconds on my Raspberry Pie 2 with ~85K data points in the queried time frame. Not really a "problem" but... It could be faster. My next attempt:
select
time_bucket(interval '1 day', time::timestamp) "day",
max(import) - min(import)
from heatingelectricitymeter
where time between '2021-07-28T02:00:00Z' and '2021-10-28T02:00:00Z'
group by "day"
order by 1
The effect should be the same and I kind of expected this to perform the same. However to my surprise this is quite a lot faster: It just takes ~1.5 seconds.
(Side note: Not casting to ::timestamp
speeds up the queries to 1.7seconds and 1.0seconds respectively. However I eventually want to use midnight-midnight in local time and if I understand the manual correctly that means that I will need to do this cast)
Now one thing is that the docs say:
The last and first commands do not use indexes, and instead perform a sequential scan through their groups.
This does not really answer why min
and max
are faster. They can't use an index either. At the same time using and index seems to make sense. We just need the first and last value for each group.
My next attempt is to gather the required values directly and not use grouping:
select "day", "import" - lag("import") over (order by "day") from (
select ('2021-10-28'::date - v) "day"
from generate_series(0, 90) s(v)
)x
join lateral (
select time, import from heatingElectricityMeter where time >= day order by time limit 1
)y on true
BINGO! Down to 0.7 seconds!
The result is not exactly the same as min/max/first/last, as in this query I will always compare the first value of each day with the first value of the previous day. In my opinion this is actually better though, because data might land on the wrong day, but it won't be lost. (In a perfect world I'd get the interpolated values at exactly midnight for each day, however I am putting that off for simplicity. For my use case I can life with the slight inaccuracy).
However it feels like I am doing it in a way to complicated fashion. To me it seems like something which is perfectly normal task to do with a time database. Is there a simpler way to efficiently do this query?
Upvotes: 4
Views: 624
Reputation: 1053
Sometimes there is still some work you have to do to optimize this, the lateral query will use indexes and will only have to do index scans on one side, and then the lag will get the previous value looked up by the index scan.
Note that you are getting slightly different values with the last query, instead of getting the min and max in a given day you're subtracting the last value of the previous day from the last value of the current, which is a slightly different thing. As long as you are okay with that, you're fine.
Now - min
and max
can, in fact, use indexes, they get optimized differently and can be made to use index scans depending on what indexes you have, so that might be why that's faster.
Not sure if that's a good enough answer or explains enough, but it is what I think is going on here.
Note: we added some functionality in TimescaleDB around this, I'd recommend reading this post in our forum that addresses this problem in more detail if you're interested: https://www.timescale.com/forum/t/dear-eon-measuring-deltas-correctly-for-energy-meters/1266
Upvotes: 1