Reputation: 1546
I have a postgres DB in which most of the tables have a column 'valid_time' indicating when the data in that row is intended to represent and an 'analysis_time' column, indicating when the estimate was made (this might be the same or a later time than the valid time in the case of a measurement or an earlier time in the case of a forecast). Typically there are multiple analysis times for each valid time, corresponding to different measurements (if you wait a bit, more data is available for a given time, so the analysis is better but the measurment is less prompt) and forecasts with different lead times.
I am using SQLalchemy to access this DB in Python.
What I would like to do is be able to pull out all rows with the most recent N unique datetimes of a specified column. For instance I might want the 3 most recent unique valid times, but this will typically be more than 3 rows, because there will be multiple analysis times for each of those 3 valid times.
I am new to relational databases. In a sense there are two parts to this question; how can this be achieved in bare SQL and then how to translate that to the SQLalchemy ORM?
Upvotes: 0
Views: 160
Reputation: 1216
I'm not sure about the SQLalchemy part, but as far as the SQL queries I would do it in two steps:
Get the times. For example, something like.
SELECT DISTINCT valid_time FROM MyTable LIMIT 3 ORDER BY valid_time DESC;
Get the rows with those times, using the previous step as a subquery:
SELECT * FROM MyTable WHERE valid_time IN (SELECT DISTINCT valid_time FROM MyTable LIMIT 3 ORDER BY valid_time DESC);
Upvotes: 1
Reputation: 1269873
Here is a solution to the first part of your question . . . how to do this in SQL:
select t.*
from (select t.*,
dense_rank() over (order by validtime desc) as seqnum
from t
) t
where seqnum <= N;
This is using the dense_rank()
window/analytic function. This enumerates the time values, with ties given the same rank value. The where
clause simply chooses how many you want.
Upvotes: 2