Reputation: 311
I have a table which stores predictions from a machine learning model. This is a model that each hour ("predicted_at") predicts a value for the next 24 hours("predicted_for"). This means that the table have many different values for each "id" and "predicted_for".
Example of how the the table looks like for one ID and one predicted_for timestamp:
value | id | predicted_at | predicted_for | |
---|---|---|---|---|
0.1825819489860161 | 6970631400382957 | 2021-08-21 16:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.14882256844401498 | 6970631400382957 | 2021-08-21 17:00:00 UTC | 2021-08-20 23:00:00 UTC | |
0.17357749613149909 | 6970631400382957 | 2021-08-21 17:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.20283864055714163 | 6970631400382957 | 2021-08-21 18:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.18582932551434195 | 6970631400382957 | 2021-08-21 19:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.18859835855398877 | 6970631400382957 | 2021-08-21 20:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.15969341546283378 | 6970631400382957 | 2021-08-21 21:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.21578765348925422 | 6970631400382957 | 2021-08-21 22:00:00 UTC | 2021-08-21 23:00:00 UTC |
What I want to do is to query this data so that I get, for each ID, only one predicted value for each "predicted_for" timestemp, and I want this value to be "predicted_at" at "predicted_for - t(2) hours".
For the example, the result would give me the following table:
value | id | predicted_at | predicted_for | |
---|---|---|---|---|
0.15969341546283378 | 6970631400382957 | 2021-08-21 21:00:00 UTC | 2021-08-21 23:00:00 UTC |
I assume I would have some kind of group by and a having clause, but I have not figured out how to solve it.
Anyone have an idea?
Upvotes: 1
Views: 56
Reputation: 173190
Consider below approach
select any_value(value) value, id,
any_value(predicted_at) predicted_at, predicted_for,
from data t
where predicted_at = predicted_for - interval 2 hour
group by id, predicted_for
if applied to sample data in your question - output is
Upvotes: 1
Reputation: 1952
Your query should look like this one:
SELECT *
FROM `YOURTABLE`
WHERE DATETIME_DIFF(predicted_for, predicted_at, HOUR) = 2
Note that it is not ensuring that you only have one record that matches the rule of having predicted_at preceding by 2 hours the value in predicted_for. To also ensure this, the query should be like this:
SELECT *
FROM `YOURTABLE` tab
WHERE predicted_at = (
SELECT MAX(predicted_at)
FROM `YOURTABLE` tab2
WHERE DATETIME_DIFF(tab2.predicted_for, tab2.predicted_at, HOUR) = 2
AND tab2.id = tab.id
)
Upvotes: 0
Reputation: 1
do a query like:
SELECT p.* FROM
( SELECT max(predicted_at) as predicted_at_max
FROM youraboveschowedtable
GROUP BY id ) p1,
DATEADD(hour, -2, predicted_at) as predicted_minus2
JOIN youraboveschowedtable p
ON p.predicted_at = p1.predicted_at_max
litle altert Source = SQL query return duplicate with most recent date and more
Upvotes: 0