bzyueshen
bzyueshen

Reputation: 41

Returning the last valid (non-null) value from a pandas DataFrame

Suppose I have a dataframe looks like:

      a      b
0    11      A
1    -2      A
2     3      A
3    NA      A
4   0.5      B
5    NA      B
6    -9      B

I can create a group by 'b'. Is there a fast way to get the last non-NA value in 'a' of each group? In this case would be 3 for group A and -9 for group B.

(In this case the series 'a' is sorted as given, but it might not be the case. There could be another column 'c', according which the 'last' is defined.)

I wrote my own loop code by looking into the grouped.groups dict. But apparently that's very inefficient given my huge dataset. I think this could be done very straightforwardly -- maybe I am just too novice with pandas :-)

Upvotes: 4

Views: 2077

Answers (1)

Wes McKinney
Wes McKinney

Reputation: 105591

I added a github issue for this recently: https://github.com/pydata/pandas/issues/1043

In the meantime, you could do:

def get_last_valid(series):
    return series.dropna().iget(-1)

df.groupby('b')['a'].apply(get_last_valid)

Upvotes: 4

Related Questions