Reputation: 78
I have a pandas Dataframe df and I want to Group by text column with aggregation of:
Now I only can do either making the english_word list or sum the count column. I try to do that, but it return error. How to do both of that aggregation?
In simple, what I want:
text
saya eat chicken
english_word
[eat,chicken]
count
2
df.groupby('text', as_index=False).agg({'count' : lambda x: x.sum(), 'english_word' : lambda x: x.list()})
This is the example of df:
df = pd.DataFrame({'text': ['Saya eat chicken', 'Saya eat chicken'],
'english_word': ['eat', 'chicken'],
'count': [1,1]})
Upvotes: 0
Views: 539
Reputation: 559
Something like this?
def summarise(df):
return Series(dict(Count = df['count'].sum(),
Words = "{%s}" % ', '.join(df['english_word'])))
new_df = df.groupby('text', as_index=False).agg({'count' : lambda x:x.sum(), 'english_word' : lambda x: x.list()})
new_df.groupby('text').apply(summarise)
Upvotes: 0
Reputation: 21709
You are almost there, you can do:
s = df.groupby('text').agg({'word': list, 'num': 'count'}).reset_index()
text word num
0 bla [i, love] 2
Sample Data
df = pd.DataFrame({'text':['bla','bla'],
'word':['i','love'],
'num':[1,2,]})
Upvotes: 3