Reputation: 343
How can I return the row index location of the last non-nan value for each column of the pandas data frame and return the locations as a pandas dataframe?
Upvotes: 5
Views: 5945
Reputation: 4091
If you want the row index of the last non-nan (and non-none) value, here is a one-liner:
>>> df = pd.DataFrame({
'a':[5,1,2,NaN],
'b':[NaN, 6,NaN, 3]})
>>> df
a b
0 5 NaN
1 1 6
2 2 NaN
3 NaN 3
>>> df.apply(lambda column: column.dropna().index[-1])
a 2
b 3
dtype: int64
Explanation:
df.apply
in this context applies a function to each column of the dataframe. I am passing it a function that takes as its argument a column, and returns the column's last non-null index.
Upvotes: 3
Reputation: 393983
Use notnull
and specifically idxmax
to get the index values of the non NaN
values
In [22]:
df = pd.DataFrame({'a':[0,1,2,NaN], 'b':[NaN, 1,NaN, 3]})
df
Out[22]:
a b
0 0 NaN
1 1 1
2 2 NaN
3 NaN 3
In [29]:
df[pd.notnull(df)].idxmax()
Out[29]:
a 2
b 3
dtype: int64
EDIT
Actually as correctly pointed out by @Caleb you can use last_valid_index
which is designed for this:
In [3]:
df = pd.DataFrame({'a':[3,1,2,np.NaN], 'b':[np.NaN, 1,np.NaN, -1]})
df
Out[3]:
a b
0 3 NaN
1 1 1
2 2 NaN
3 NaN -1
In [6]:
df.apply(pd.Series.last_valid_index)
Out[6]:
a 2
b 3
dtype: int64
Upvotes: 9