Reputation: 11
I have a table that looks like the below.
ParentID | PersonID | Year
----------------------------
1 1 2019
1 2 2020
3 3 2019
3 4 2020
5 5 2019
I'm trying to figure out how to select the current PersonID when a ParentID has more than one record so my results would look like the below.
ParentID | PersonID | Year
----------------------------
1 2 2020
3 4 2020
5 5 2019
I can't select just the max PersonID because we sometimes create Person records for the previous year, in which case the PersonID is greater, and we still want to return this year's record. I also can't select based on year, because if they don't have a record for this year, we still need their most recent record for each ever year that is.
I've tried selecting this subset in half a dozen ways at this point and have only managed to make my brain hurt. Any assistance would be appreciated!!
Upvotes: 1
Views: 43
Reputation: 2125
Assuming you are using MSSQL, this can be achieved by ROW_NUMBER
. You can read more about ROW_NUMBER
here. The PARTITION BY
divides the result into partitions and apply row numbers to the partitions. So, applying partition to ParentId
and sorting with Year
descending, the data sorted ParentId
by Year
. Then remove the older data by using the RowNo = 1
condition.
Create Table Test(ParentId int, PersonId int, Year int);
INSERT INTO Test values
(1, 1, 2019),
(1, 2, 2020),
(3, 3, 2019),
(3, 4, 2020),
(5, 5, 2019);
SELECT ParentId, PersonId, Year FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY ParentId
ORDER BY Year /* Use PersonId if it fits correctly */ DESC) AS RowNo,
ParentId, PersonId, Year from Test -- Table Name
) E WHERE ROWNo = 1
Upvotes: 0
Reputation: 222482
This is a typical greatest-n-per-group problem. To solve it, you need to think filtering rather than aggregation.
A portable solution is to filter with a correlated subquery that returns the latest year
per parent_id
:
select t.*
from mytable t
where t.year = (
select max(t1.year) from mytable t1 where t1.parent_id = t.parent_id
)
Upvotes: 1