Dino
Dino

Reputation: 11

Pandas averaging by specific row

I have table taht looks like following

A | B | C
__________
na| 1 | 2
3 | 3 | 4
na| 5 | 6
na| 7 | 8
2 | 9 | 10
na| 11| 12
na| 13| 14
3 | 15| 16

I would like to average all by column A. All the nan values till the number, final outcome should look like

A | B | C
__________
3 | 2 | 3
2 | 7 | 8
3 |13 |14

Upvotes: 1

Views: 103

Answers (2)

René
René

Reputation: 4827

df = pd.DataFrame({'A':[pd.NA,3,pd.NA,pd.NA,2,pd.NA,pd.NA,3], 'B':range(1,16,2), 'C':range(2,17,2)})
df.assign(idx = [i if i in df.index[df.A.notna()] else pd.NA for i in df.index]).bfill().groupby('idx').mean().astype(int).reset_index(drop=True)

Output:

   A   B   C
0  3   2   3
1  2   7   8
2  3  13  14

Upvotes: 2

sophocles
sophocles

Reputation: 13841

You could backward fill column A and then perform a groupby with mean():

df.assign(A = lambda col:col['A'].bfill()).groupby('A',as_index=False).mean()

prints:

     A    B    C
0  2.0  7.0  8.0
1  3.0  2.0  3.0

EDIT:

If you have the same values in column A, you will need to find a way to signal that a 3 on second row is not the same as 3 on the last row, which we can do with diff() and cumsum(). Then it's again a matter of bfill() and groupby with mean():

df = pd.DataFrame({'A': {0: np.nan, 1: 3, 2: np.nan, 3: np.nan, 4: 2, 5:np.nan, 6: np.nan, 7: 3}, 'B': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15}, 'C': {0: 2, 1: 4, 2: 6, 3: 8, 4: 10, 5: 12, 6: 14, 7: 16}})

df.assign(new_A = (df.A.bfill().diff() != 0).cumsum()
          ).groupby(['new_A']).mean().reset_index(drop=True)

     A     B     C
0  3.0   2.0   3.0
1  2.0   7.0   8.0
2  3.0  13.0  14.0

pd.__version__
Out[46]: '1.3.5'

Upvotes: 2

Related Questions