harry04
harry04

Reputation: 962

Grouping by substring of a column value in Pandas

While grouping a pandas dataframe, I've found a problem in the data that doesn't group my dataframe effectively, and now my grouping looks like -

challenge                      count    mean
['acsc1', '[object Object]']    1   0.000000
['acsc1', 'undefined']          1   0.000000
['acsc1', 'wind-for']          99   379.284146
['acsc1']                      47   19.340045
['acsc10', 'wind-for']         73   370.148354
['acsc10']                     22   143.580856

How can I group these rows starting with ascs1 as one row (summing the other column values) and acsc10 into one row and so on? The desired result should look something like -

challenge       category       count    mean
acsc1           wind-for       148      398.62
acsc10          wind-for        95      513.72

But I know the category column might be a stretch with the noise in this column.

Upvotes: 0

Views: 988

Answers (2)

BENY
BENY

Reputation: 323276

We can do

s=pd.DataFrame(df['challenge'].tolist(),index=df.index,columns=['challenge','cate'])
d={'cate':'last','count':'count','mean':'sum'}
df=pd.concat([df.drop('challenge',1),s],axis=1).\
    groupby('challenge').agg(d).reset_index()

Update fix the string type list

import ast
df.challenge=df.challenge.apply(ast.literal_eval)
df.groupby(df.challenge.str[0]).sum()
           count        mean
challenge                   
acsc1        148  398.624191
acsc10        95  513.729210

Upvotes: 1

moys
moys

Reputation: 8033

This should get you the result you requested initially (without the category column) df.groupby(df.challenge.apply(lambda x: x.split(",")[0].strip("[']"))).sum().reset_index()

Output

challenge   count   mean
0   acsc1   148     398.624191
1   acsc10  95      513.729210

Upvotes: 1

Related Questions