Reputation: 33
I have a dataframe, let's call it 'data', as follows:
index ID name
0 23 aaa
1 42 bbb
2 23 aab
3 42 bbb
4 42 bbb
...
I want to count the occurences of ID and create an extra column for that by which I can sort. Additionally I want to add the names to sets, because they might differ. Something like this (additional index optional):
count ID name
3 42 {bbb}
2 23 {aaa, aab}
I know the solution is somewhere in the group_by() function. I can put the names into sets with data.groupby('ID')['name'].apply(set).reset_index()
but how do I additionally count the occurrences of ID and add the numbers to the DataFrame correctly? I'm standing on the hose, as the German says. Thanks a lot!
Upvotes: 2
Views: 59
Reputation: 184
You can use this:
df.groupby(['ID'])['name'].agg([set,'count']).rename(columns={'set':'name'}).reset_index()
Output:
ID name count
0 23 {aaa, aab} 2
1 42 {bbb} 3
Upvotes: 1
Reputation: 13821
You can use groupBy.agg()
as follows:
df.groupby('ID').agg({'ID':'count',
'name':set}).rename(columns={'ID':'ID_count'}).reset_index()
prints:
ID ID_count name
0 23 2 {aaa, aab}
1 42 3 {bbb}
Upvotes: 1
Reputation: 195438
You can use .agg
with multiple parameters:
x = df.groupby("ID", as_index=False).agg(
count=("ID", "size"), name=("name", set)
)
print(x)
Prints:
ID count name
0 23 2 {aaa, aab}
1 42 3 {bbb}
Upvotes: 1