Reputation: 449
I have a DataFrame df
with some country statistics for years from 2014 to 2018. Some of the countries have values for each of the years, while some countries are missing some. The DataFrame looks like this:
countryName 2014 2015 2016 2017 2018
Afghanistan .. .. 281 .. ..
Albania .. .. .. 891 901
Algeria 791 801 804 817 820
...
I want to keep only the most recent data value, so for the DataFrame above, the result should be:
countryName value
Afghanistan 281 # the most recent value from 2016
Albania 901 # the most recent value from 2018
Algeria 820 # the most recent value from 2018
...
Upvotes: 2
Views: 309
Reputation: 11502
You could do this:
assume your df is
countryName 2014 2015 2016 2017 2018
0 Afghanistan NaN NaN 281.0 NaN ..
1 Albania NaN NaN NaN 891.0 901
2 Algeria 791.0 801.0 804.0 817.0 820
Then
df['LastValue'] = df.iloc[:, 1:].ffill(axis=1).iloc[:, -1]
produces
countryName 2014 2015 2016 2017 2018 LastValue
0 Afghanistan NaN NaN 281.0 NaN NaN 281.0
1 Albania NaN NaN NaN 891.0 901.0 901.0
2 Algeria 791.0 801.0 804.0 817.0 820.0 820.0
Upvotes: 0
Reputation: 28709
You could use the forward fill on the columns axis to get the last numeric values
data = [{'countryName': 'Afghanistan',
'2014': nan,
'2015': nan,
'2016': 281.0,
'2017': nan,
'2018': nan},
{'countryName': 'Albania',
'2014': nan,
'2015': nan,
'2016': nan,
'2017': 891.0,
'2018': 901.0},
{'countryName': 'Algeria',
'2014': 791.0,
'2015': 801.0,
'2016': 804.0,
'2017': 817.0,
'2018': 820.0}]
df = pd.DataFrame(data)
df.ffill(1).iloc[:, [0, -1]].set_axis(['countryName', 'value'], axis=1)
countryName value
0 Afghanistan 281.0
1 Albania 901.0
2 Algeria 820.0
Upvotes: 3