Grouping by substring of a column value in Pandas

Question

While grouping a pandas dataframe, I've found a problem in the data that doesn't group my dataframe effectively, and now my grouping looks like -

challenge                      count    mean
['acsc1', '[object Object]']    1   0.000000
['acsc1', 'undefined']          1   0.000000
['acsc1', 'wind-for']          99   379.284146
['acsc1']                      47   19.340045
['acsc10', 'wind-for']         73   370.148354
['acsc10']                     22   143.580856

How can I group these rows starting with ascs1 as one row (summing the other column values) and acsc10 into one row and so on? The desired result should look something like -

challenge       category       count    mean
acsc1           wind-for       148      398.62
acsc10          wind-for        95      513.72

But I know the category column might be a stretch with the noise in this column.

moys · Accepted Answer

This should get you the result you requested initially (without the category column) df.groupby(df.challenge.apply(lambda x: x.split(",")[0].strip("[']"))).sum().reset_index()

Output

challenge   count   mean
0   acsc1   148     398.624191
1   acsc10  95      513.729210

Grouping by substring of a column value in Pandas

Answers (2)

Related Questions