Pandas Series - groupby and take cumulative most recent non-null

Question

I have a dataframe with a Category column (which we will group by) and a Value column. I want to add a new column LastCleanValue which shows the most recent non null value for this group. If there have not been any non-nulls yet in the group, we just take null. For example:

df = pd.DataFrame({'Category':['a','a','a','b','b','a','a','b','a','a','b'],
                   'Value':[np.nan, np.nan, 34, 40, 42, 25, np.nan, np.nan, 31, 33, np.nan]})

And the function should add a new column:

|    | Category   |   Value |   LastCleanValue |
|---:|:-----------|--------:|-----------------:|
|  0 | a          |     nan |              nan |
|  1 | a          |     nan |              nan |
|  2 | a          |      34 |               34 |
|  3 | b          |      40 |               40 |
|  4 | b          |      42 |               42 |
|  5 | a          |      25 |               25 |
|  6 | a          |     nan |               25 |
|  7 | b          |     nan |               42 |
|  8 | a          |      31 |               31 |
|  9 | a          |      33 |               33 |
| 10 | b          |     nan |               42 |

How can I do this in Pandas? I was attempting something like df.groupby('Category')['Value'].dropna().last()

BENY · Accepted Answer

This is more like ffill

df['new'] = df.groupby('Category')['Value'].ffill()
Out[430]: 
0      NaN
1      NaN
2     34.0
3     40.0
4     42.0
5     25.0
6     25.0
7     42.0
8     31.0
9     33.0
10    42.0
Name: Value, dtype: float64

Pandas Series - groupby and take cumulative most recent non-null

Answers (1)

Related Questions