Pandas groupby mean issue

Question

I am trying to find the mean by event for each round (R1,R2,R3,R4). The rounds not played are unfortunately represented in 3 different ways (0, - or empty cell).

    event   plyr    R1  R2  R3  R4
0   Houston Dave    67  90.0    70  72
1   Houston Bobx    69  69.0    69  69
2   Houston Carlx   69  71.0    71  71
3   Miamixx Cliff   67  70.0    70  70
4   Miamixx Dean    70  71.0    71  71
5   Miamixx Clive   69  69.0    -   0
6   Miamixx Patxx   71  70.0    -   0
7   Atlanta Phil    67  70.0    70  72
8   Atlanta Dave    69  NaN 71  73
9   Atlanta Bobx    69  NaN -   0

I have tried replacing the 0 and - with NaN but still get varying results

df['R3'] = df['R3'].replace(['0', '-'], np.nan) df['R4'] = df['R4'].replace(['0', '-'], np.nan)

The results

df.groupby('event')['R1','R2', 'R3', 'R4'].mean()


R1  R2  R4
event           
Atlanta 68.333333   70.000000   48.333333
Houston 68.333333   76.666667   70.666667
Miamixx 69.250000   70.000000   35.250000

CHRD · Accepted Answer

The groupby mean aggregation will exclude NaN values but include zeros. So you need to replace by 0 or keep the NaN depending on the result you're after.

This will set all the - and NaN values to 0:

cols = ['R1', 'R2', 'R3', 'R4']

for col in cols:
    df[col] = np.where((df[col]=='-') | (df[col].isnull()==True), 0, df[col])
    df[col] = pd.to_numeric(df[col])

df.groupby('event').mean()

If you want NaN instead of 0 simply replace the 0 in np.where() with np.NaN.

Pandas groupby mean issue

Answers (2)

`to_csv`/`read_csv`

`pd.to_numeric`

Related Questions

Pandas groupby mean issue

Answers (2)

to_csv/read_csv

pd.to_numeric

Related Questions

`to_csv`/`read_csv`

`pd.to_numeric`