Reputation: 41
Suppose I have a dataframe
looks like:
a b
0 11 A
1 -2 A
2 3 A
3 NA A
4 0.5 B
5 NA B
6 -9 B
I can create a group by 'b'. Is there a fast way to get the last non-NA value in 'a' of each group? In this case would be 3 for group A and -9 for group B.
(In this case the series 'a' is sorted as given, but it might not be the case. There could be another column 'c', according which the 'last' is defined.)
I wrote my own loop code by looking into the grouped.groups dict. But apparently that's very inefficient given my huge dataset. I think this could be done very straightforwardly -- maybe I am just too novice with pandas :-)
Upvotes: 4
Views: 2077
Reputation: 105591
I added a github issue for this recently: https://github.com/pydata/pandas/issues/1043
In the meantime, you could do:
def get_last_valid(series):
return series.dropna().iget(-1)
df.groupby('b')['a'].apply(get_last_valid)
Upvotes: 4