Reputation: 962
While grouping a pandas dataframe, I've found a problem in the data that doesn't group my dataframe effectively, and now my grouping looks like -
challenge count mean
['acsc1', '[object Object]'] 1 0.000000
['acsc1', 'undefined'] 1 0.000000
['acsc1', 'wind-for'] 99 379.284146
['acsc1'] 47 19.340045
['acsc10', 'wind-for'] 73 370.148354
['acsc10'] 22 143.580856
How can I group these rows starting with ascs1
as one row (summing the other column values) and acsc10
into one row and so on? The desired result should look something like -
challenge category count mean
acsc1 wind-for 148 398.62
acsc10 wind-for 95 513.72
But I know the category column might be a stretch with the noise in this column.
Upvotes: 0
Views: 988
Reputation: 323276
We can do
s=pd.DataFrame(df['challenge'].tolist(),index=df.index,columns=['challenge','cate'])
d={'cate':'last','count':'count','mean':'sum'}
df=pd.concat([df.drop('challenge',1),s],axis=1).\
groupby('challenge').agg(d).reset_index()
Update fix the string type list
import ast
df.challenge=df.challenge.apply(ast.literal_eval)
df.groupby(df.challenge.str[0]).sum()
count mean
challenge
acsc1 148 398.624191
acsc10 95 513.729210
Upvotes: 1
Reputation: 8033
This should get you the result you requested initially (without the category column)
df.groupby(df.challenge.apply(lambda x: x.split(",")[0].strip("[']"))).sum().reset_index()
Output
challenge count mean
0 acsc1 148 398.624191
1 acsc10 95 513.729210
Upvotes: 1