Reputation: 227
I have this dataset
patient_id doctor_id status created_at
1 1 A 2020-10-01 10:00:00
1 1 P 2020-10-01 10:30:00
1 1 U 2020-10-01 10:35:00
1 2 A 2020-10-01 10:40:00
...
I want to group it by patient_id and doctor_id but without the status is grouped so the result will be like this
patient_id doctor_id status created_at
1 1 U 2020-10-01 10:35:00
1 2 A 2020-10-01 10:40:00
...
AWS Athena have to grouped all column but I need the last status
Upvotes: 0
Views: 796
Reputation: 132922
In Athena/Presto you can do this with the max_by
function:
SELECT
patient_id,
doctor_id,
MAX_BY(status, created_at) AS last_status
FROM the_table
GROUP BY 1, 2
max_by(x, y)
function returns the value of the column x
for the row with the max value of column y
of the group.
Upvotes: 2
Reputation: 522386
ROW_NUMBER
provides one option here:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY patient_id, doctor_id ORDER BY created_at DESC) rn
FROM yourTable
)
SELECT patient_id, doctor_id, status, created_at
FROM cte
WHERE rn = 1
ORDER BY patient_id, doctor_id;
Upvotes: 1