Bogdanovist
Bogdanovist

Reputation: 1546

Selecting the rows with the N most recent unique values of a datetime

I have a postgres DB in which most of the tables have a column 'valid_time' indicating when the data in that row is intended to represent and an 'analysis_time' column, indicating when the estimate was made (this might be the same or a later time than the valid time in the case of a measurement or an earlier time in the case of a forecast). Typically there are multiple analysis times for each valid time, corresponding to different measurements (if you wait a bit, more data is available for a given time, so the analysis is better but the measurment is less prompt) and forecasts with different lead times.

I am using SQLalchemy to access this DB in Python.

What I would like to do is be able to pull out all rows with the most recent N unique datetimes of a specified column. For instance I might want the 3 most recent unique valid times, but this will typically be more than 3 rows, because there will be multiple analysis times for each of those 3 valid times.

I am new to relational databases. In a sense there are two parts to this question; how can this be achieved in bare SQL and then how to translate that to the SQLalchemy ORM?

Upvotes: 0

Views: 160

Answers (2)

Or Neeman
Or Neeman

Reputation: 1216

I'm not sure about the SQLalchemy part, but as far as the SQL queries I would do it in two steps:

  1. Get the times. For example, something like.

    SELECT DISTINCT valid_time FROM MyTable LIMIT 3 ORDER BY valid_time DESC;

  2. Get the rows with those times, using the previous step as a subquery:

    SELECT * FROM MyTable WHERE valid_time IN (SELECT DISTINCT valid_time FROM MyTable LIMIT 3 ORDER BY valid_time DESC);

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269873

Here is a solution to the first part of your question . . . how to do this in SQL:

select t.*
from (select t.*,
             dense_rank() over (order by validtime desc) as seqnum
      from t
     ) t
where seqnum <= N;

This is using the dense_rank() window/analytic function. This enumerates the time values, with ties given the same rank value. The where clause simply chooses how many you want.

Upvotes: 2

Related Questions