Reputation: 2863
I have a database of sports race results, a Race ID which will appear up to 8 times with rows for each Athelte Name and Finish Time data.
I am trying to extract the 4th placed time for each race.
My current (pseudo)code is:
races = "SELECT DISTINCT race_id, race_date FROM race_table ORDER BY race_date"
for race in races:
fourth_place = "SELECT race_date, finish_time FROM race_table WHERE race_id={} ORDER BY finish_time LIMIT 1 OFFSET 3".format(race['race_id'])
In Python I am running a seperate query for each race which takes time.
Is there a way I can construct a nested PostgreSQL Query that will return this dataset of one row per race and that row is an offset subquery?
My initial attempts have only returned a single row. If this can't be done in SQL SELECT should I be looking into Stored Procedures?
Edit: Example dataset, single table, unsorted:
race_id, race_date, athlete, finish_time
1001, 2021-01-16, "Joe B", 12.54
1001, 2021-01-16, "Emma J", 12.03
1001, 2021-01-16, "Tim N", 12.66
1001, 2021-01-16, "Joe B", 13.54
1001, 2021-01-16, "Sarah J", 12.28
1001, 2021-01-16, "Tom N", 13.22
1001, 2021-01-16, "Jim Z", 12.37
1001, 2021-01-16, "Emma K", 11.94
1002, 2021-01-22, "John B", 13.01
1002, 2021-01-22, "Jane D", 13.22
1002, 2021-01-22, "Ron H", 14.01
1002, 2021-01-22, "Dan M", 15.22
1002, 2021-01-22, "Gill P", 14.27
1002, 2021-01-22, "Jo V", 13.88
1002, 2021-01-22, "Harry T", 13.21
1002, 2021-01-22, "Jane Q", 14.62
Query should return one row per race (and that should be selected by the 4th placed (4th lowest number) time:
race_date, finish_time
2021-01-16, 12.37
2021-01-22, 13.88
Upvotes: 0
Views: 68
Reputation: 3262
Your example:
CREATE TABLE race_table(race_id integer, race_date date, athlete text, finish_time numeric (4,2));
INSERT INTO race_table VALUES
(1001, '2021-01-16', 'Joe B', '12.54'),
(1001, '2021-01-16', 'Emma J', '12.03'),
(1001, '2021-01-16', 'Tim N', '12.66'),
(1001, '2021-01-16', 'Joe B', '13.54'),
(1001, '2021-01-16', 'Sarah J', '12.03'),
(1001, '2021-01-16', 'Tom N', '13.22'),
(1001, '2021-01-16', 'Jim Z', '12.37'),
(1001, '2021-01-16', 'Emma K', '11.94'),
(1002, '2021-01-22', 'John B', '13.01'),
(1002, '2021-01-22', 'Jane D', '13.22'),
(1002, '2021-01-22', 'Ron H', '14.01'),
(1002, '2021-01-22', 'Dan M', '15.22'),
(1002, '2021-01-22', 'Gill P', '14.27'),
(1002, '2021-01-22', 'Jo V', '13.88'),
(1002, '2021-01-22', 'Harry T', '13.21'),
(1002, '2021-01-22', 'Jane Q', '14.62');
I modified your data to contain a duplicate value for finish_time.
You start with a query that adds a column containing the rank to your data:
SELECT race_id,race_date,athlete,finish_time,
rank() OVER(PARTITION BY race_id ORDER BY finish_time)AS row_nr
FROM race_table
I think rank()
is what you need. Depending on your requirements you may use row_number()
or dense_rank()
instead. Check it out - it affects how the duplicate value is handled.
Then use this query and filter by row_nr
:
WITH race_ranked AS
(SELECT race_id,race_date,athlete,finish_time,
rank() OVER(PARTITION BY race_id ORDER BY finish_time)AS row_nr
FROM race_table )
SELECT race_id,race_date,athlete,finish_time
FROM race_ranked
WHERE row_nr = 4;
Output:
race_id | race_date | athlete | finish_time
---------+------------+---------+-------------
1001 | 2021-01-16 | Jim Z | 12.37
1002 | 2021-01-22 | Jo V | 13.88
Upvotes: 1