Reputation: 80
I can't seem to find a perhaps easy solution to what I'm trying to accomplish here, using SQL and, more importantly, QuestDB. I also find it hard to put my exact question into words so bear with me.
Input
My real input is different of course but a similar dataset or case is the gas_prices
table on the demo page of QuestDB. On https://demo.questdb.io, you can directly write and run queries against some sample database, so it should be easy enough to follow.
The main task I want to accomplish is to find out which month was responsible for the year's highest galon price.
Output
Using the following query, I can get the average galon price per month just fine.
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
timestamp | avg_per_month |
---|---|
2000-06-05T00:00:00.000000Z | 1.6724 |
2000-07-05T00:00:00.000000Z | 1.69275 |
2000-08-05T00:00:00.000000Z | 1.635 |
... | ... |
Then, I get all these monthly averages, group them by year and return the maximum galon price per year by wrapping the above query in a subquery, like so:
SELECT timestamp, max(avg_per_month) as max_per_year FROM (
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
) SAMPLE BY 12M
timestamp | max_per_year |
---|---|
2000-01-05T00:00:00.000000Z | 1.69275 |
2001-01-05T00:00:00.000000Z | 1.767399999999 |
2002-01-05T00:00:00.000000Z | 1.52075 |
... | ... |
Wanted output
I want to know which month was responsible for the maximum price of a year.
Looking at the output of the above query, we see that the maximum galon price for the year 2000 was 1.69275. Which month of the year 2000 had this amount as average price? I'd like to display this month in an additional column.
For the first row, July 2000 is shown in the additional column for year 2000 because it is responsible for the highest average price in 2000. For the second row, it was May 2001 as that month had the highest average price of 2001.
timestamp | max_per_year | which_month_is_responsible |
---|---|---|
2000-01-05T00:00:00.000000Z | 1.69275 | 2000-07-05T00:00:00.000000Z |
2001-01-05T00:00:00.000000Z | 1.767399999999 | 2001-05-05T00:00:00.000000Z |
... | ... |
What did I try?
I tried by adding a subquery to the SELECT
to have a "duplicate" of some sort for the timestamp
column but that's apparently never valid in QuestDB (?), so probably the solution is by adding even more subqueries in the FROM
? Or a UNION?
Who can help me out with this? The data is there in the database and it can be calculated. It's just a matter of getting it out.
Upvotes: 1
Views: 113
Reputation: 395
I think 'wanted output' can be achieved with window functions. Please have a look at:
CREATE TABLE electricity (ts TIMESTAMP, consumption DOUBLE) TIMESTAMP(ts);
INSERT INTO electricity
SELECT (x*1000000)::timestamp, rnd_double()
FROM long_sequence(10000000);
SELECT day, ts, max_per_day
FROM
(
SELECT timestamp_floor('d', ts) as day,
ts,
avg_in_15_min as max_per_day,
row_number() OVER (PARTITION BY timestamp_floor('d', ts) ORDER BY avg_in_15_min desc) as rn_per_day
FROM
(
SELECT ts, avg(consumption) as avg_in_15_min
FROM electricity
SAMPLE BY 15m
)
) WHERE rn_per_day = 1
Upvotes: 2