Reputation: 825
I have the following dataframe
| Name | Year | COL1 | COL2 | COL3 |
------------------------------------------------
0 | A | 1980 | NaN | NaN | 9817 |
1 | A | 1981 | NaN | 1610 | 11487 |
2 | A | 1982 | 22.74 | NaN | 12256 |
3 | A | 1983 | 15.50 | NaN | 14214 |
4 | B | 1980 | NaN | 1580 | NaN |
5 | B | 1981 | 13.50 | 1603 | NaN |
6 | B | 1982 | 17.00 | 1610 | NaN |
7 | B | 1983 | 19.30 | NaN | NaN |
8 | C | 1980 | 12.00 | NaN | 1750 |
9 | C | 1981 | 15.50 | 1997 | 1790 |
10| C | 1982 | 16.36 | 2027 | 1790 |
11| C | 1983 | 22.74 | NaN | 1810 |
I need to get the following data of dataframe above.
This data is the most recent (by year
) data for each Name
considering the columns COL1
, COl2
and COL3
Name | COL1 | COL2 | COL3 |
------------------------------------------------
A | 15.50 | 1610 | 14214 |
B | 19.30 | 1610 | NaN |
C | 22.74 | 2027 | 1810 |
Upvotes: 0
Views: 43
Reputation: 862581
Use GroupBy.last
for last non missing value per groups:
#if necessary
#df = df.sort_values(['Name','Year'])
df = df.groupby('Name', as_index=False).last()
Upvotes: 3