Samir Hinojosa
Samir Hinojosa

Reputation: 825

How to get the most recent value for each column considering the NaN values using pandas

I have the following dataframe

  |  Name  |  Year  |  COL1  |  COL2  |  COL3  |
------------------------------------------------
0 |    A   |  1980  |   NaN  |   NaN  |  9817  |
1 |    A   |  1981  |   NaN  |  1610  |  11487 |
2 |    A   |  1982  | 22.74  |   NaN  |  12256 |
3 |    A   |  1983  | 15.50  |   NaN  |  14214 |
4 |    B   |  1980  |   NaN  |  1580  |   NaN  |
5 |    B   |  1981  | 13.50  |  1603  |   NaN  |
6 |    B   |  1982  | 17.00  |  1610  |   NaN  |
7 |    B   |  1983  | 19.30  |   NaN  |   NaN  |
8 |    C   |  1980  | 12.00  |   NaN  |  1750  |
9 |    C   |  1981  | 15.50  |  1997  |  1790  |
10|    C   |  1982  | 16.36  |  2027  |  1790  |
11|    C   |  1983  | 22.74  |   NaN  |  1810  |

I need to get the following data of dataframe above.
This data is the most recent (by year) data for each Name considering the columns COL1, COl2 and COL3

  Name  |  COL1  |  COL2  |  COL3  |
------------------------------------------------
    A   | 15.50  |  1610  |  14214 |
    B   | 19.30  |  1610  |   NaN  |
    C   | 22.74  |  2027  |  1810  |

Upvotes: 0

Views: 43

Answers (1)

jezrael
jezrael

Reputation: 862581

Use GroupBy.last for last non missing value per groups:

#if necessary
#df = df.sort_values(['Name','Year'])
df = df.groupby('Name', as_index=False).last()

Upvotes: 3

Related Questions