ahat
ahat

Reputation: 343

Get row-index of the last non-NaN value in each column of a pandas data frame

How can I return the row index location of the last non-nan value for each column of the pandas data frame and return the locations as a pandas dataframe?

Upvotes: 5

Views: 5945

Answers (2)

Caleb
Caleb

Reputation: 4091

If you want the row index of the last non-nan (and non-none) value, here is a one-liner:

>>> df = pd.DataFrame({
        'a':[5,1,2,NaN], 
        'b':[NaN, 6,NaN, 3]})

>>> df
    a   b
0   5 NaN
1   1   6
2   2 NaN
3 NaN   3

>>> df.apply(lambda column: column.dropna().index[-1])
a    2
b    3
dtype: int64

Explanation:

df.apply in this context applies a function to each column of the dataframe. I am passing it a function that takes as its argument a column, and returns the column's last non-null index.

Upvotes: 3

EdChum
EdChum

Reputation: 393983

Use notnull and specifically idxmax to get the index values of the non NaN values

In [22]:

df = pd.DataFrame({'a':[0,1,2,NaN], 'b':[NaN, 1,NaN, 3]})
df
Out[22]:
    a   b
0   0 NaN
1   1   1
2   2 NaN
3 NaN   3
In [29]:

df[pd.notnull(df)].idxmax()
Out[29]:
a    2
b    3
dtype: int64

EDIT

Actually as correctly pointed out by @Caleb you can use last_valid_index which is designed for this:

In [3]:
df = pd.DataFrame({'a':[3,1,2,np.NaN], 'b':[np.NaN, 1,np.NaN, -1]})
df

Out[3]:
    a   b
0   3 NaN
1   1   1
2   2 NaN
3 NaN  -1

In [6]:
df.apply(pd.Series.last_valid_index)

Out[6]:
a    2
b    3
dtype: int64

Upvotes: 9

Related Questions