Reputation: 11
I have table taht looks like following
A | B | C
__________
na| 1 | 2
3 | 3 | 4
na| 5 | 6
na| 7 | 8
2 | 9 | 10
na| 11| 12
na| 13| 14
3 | 15| 16
I would like to average all by column A. All the nan values till the number, final outcome should look like
A | B | C
__________
3 | 2 | 3
2 | 7 | 8
3 |13 |14
Upvotes: 1
Views: 103
Reputation: 4827
df = pd.DataFrame({'A':[pd.NA,3,pd.NA,pd.NA,2,pd.NA,pd.NA,3], 'B':range(1,16,2), 'C':range(2,17,2)})
df.assign(idx = [i if i in df.index[df.A.notna()] else pd.NA for i in df.index]).bfill().groupby('idx').mean().astype(int).reset_index(drop=True)
Output:
A B C
0 3 2 3
1 2 7 8
2 3 13 14
Upvotes: 2
Reputation: 13841
You could backward fill column A and then perform a groupby
with mean()
:
df.assign(A = lambda col:col['A'].bfill()).groupby('A',as_index=False).mean()
prints:
A B C
0 2.0 7.0 8.0
1 3.0 2.0 3.0
EDIT:
If you have the same values in column A, you will need to find a way to signal that a 3 on second row is not the same as 3 on the last row, which we can do with diff()
and cumsum()
. Then it's again a matter of bfill()
and groupby
with mean()
:
df = pd.DataFrame({'A': {0: np.nan, 1: 3, 2: np.nan, 3: np.nan, 4: 2, 5:np.nan, 6: np.nan, 7: 3}, 'B': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15}, 'C': {0: 2, 1: 4, 2: 6, 3: 8, 4: 10, 5: 12, 6: 14, 7: 16}})
df.assign(new_A = (df.A.bfill().diff() != 0).cumsum()
).groupby(['new_A']).mean().reset_index(drop=True)
A B C
0 3.0 2.0 3.0
1 2.0 7.0 8.0
2 3.0 13.0 14.0
pd.__version__
Out[46]: '1.3.5'
Upvotes: 2