MattP
MattP

Reputation: 2863

PostgreSQL Recursive / Subset Query

I have a database of sports race results, a Race ID which will appear up to 8 times with rows for each Athelte Name and Finish Time data.

I am trying to extract the 4th placed time for each race.

My current (pseudo)code is:

races = "SELECT DISTINCT race_id, race_date FROM race_table ORDER BY race_date"

for race in races:
    fourth_place = "SELECT race_date, finish_time FROM race_table WHERE race_id={} ORDER BY finish_time LIMIT 1 OFFSET 3".format(race['race_id'])

In Python I am running a seperate query for each race which takes time.

Is there a way I can construct a nested PostgreSQL Query that will return this dataset of one row per race and that row is an offset subquery?

My initial attempts have only returned a single row. If this can't be done in SQL SELECT should I be looking into Stored Procedures?

Edit: Example dataset, single table, unsorted:

race_id, race_date, athlete, finish_time
1001, 2021-01-16, "Joe B", 12.54
1001, 2021-01-16, "Emma J", 12.03
1001, 2021-01-16, "Tim N", 12.66
1001, 2021-01-16, "Joe B", 13.54
1001, 2021-01-16, "Sarah J", 12.28
1001, 2021-01-16, "Tom N", 13.22
1001, 2021-01-16, "Jim Z", 12.37
1001, 2021-01-16, "Emma K", 11.94
1002, 2021-01-22, "John B", 13.01
1002, 2021-01-22, "Jane D", 13.22
1002, 2021-01-22, "Ron H", 14.01
1002, 2021-01-22, "Dan M", 15.22
1002, 2021-01-22, "Gill P", 14.27
1002, 2021-01-22, "Jo V", 13.88
1002, 2021-01-22, "Harry T", 13.21
1002, 2021-01-22, "Jane Q", 14.62

Query should return one row per race (and that should be selected by the 4th placed (4th lowest number) time:

race_date, finish_time
2021-01-16, 12.37
2021-01-22, 13.88

Upvotes: 0

Views: 68

Answers (1)

clamp
clamp

Reputation: 3262

Your example:

CREATE TABLE race_table(race_id integer, race_date date, athlete text, finish_time numeric (4,2));
INSERT INTO race_table VALUES
(1001, '2021-01-16', 'Joe B', '12.54'),
(1001, '2021-01-16', 'Emma J', '12.03'),
(1001, '2021-01-16', 'Tim N', '12.66'),
(1001, '2021-01-16', 'Joe B', '13.54'),
(1001, '2021-01-16', 'Sarah J', '12.03'),
(1001, '2021-01-16', 'Tom N', '13.22'),
(1001, '2021-01-16', 'Jim Z', '12.37'),
(1001, '2021-01-16', 'Emma K', '11.94'),
(1002, '2021-01-22', 'John B', '13.01'),
(1002, '2021-01-22', 'Jane D', '13.22'),
(1002, '2021-01-22', 'Ron H', '14.01'),
(1002, '2021-01-22', 'Dan M', '15.22'),
(1002, '2021-01-22', 'Gill P', '14.27'),
(1002, '2021-01-22', 'Jo V', '13.88'),
(1002, '2021-01-22', 'Harry T', '13.21'),
(1002, '2021-01-22', 'Jane Q', '14.62');

I modified your data to contain a duplicate value for finish_time.

You start with a query that adds a column containing the rank to your data:

SELECT race_id,race_date,athlete,finish_time,
 rank() OVER(PARTITION BY race_id ORDER BY finish_time)AS row_nr
FROM race_table

I think rank()is what you need. Depending on your requirements you may use row_number() or dense_rank() instead. Check it out - it affects how the duplicate value is handled.

Then use this query and filter by row_nr:

WITH race_ranked AS 
(SELECT race_id,race_date,athlete,finish_time,
 rank() OVER(PARTITION BY race_id ORDER BY finish_time)AS row_nr
FROM race_table )
SELECT race_id,race_date,athlete,finish_time 
FROM race_ranked 
WHERE row_nr = 4;

Output:

 race_id | race_date  | athlete | finish_time 
---------+------------+---------+-------------
    1001 | 2021-01-16 | Jim Z   |       12.37
    1002 | 2021-01-22 | Jo V    |       13.88

Upvotes: 1

Related Questions