bfalk
bfalk

Reputation: 33

Grouping a DataFrame, counting occurrences in one column, putting other column values in sets

I have a dataframe, let's call it 'data', as follows:

index  ID  name
0      23   aaa
1      42   bbb
2      23   aab
3      42   bbb
4      42   bbb
... 

I want to count the occurences of ID and create an extra column for that by which I can sort. Additionally I want to add the names to sets, because they might differ. Something like this (additional index optional):

count ID  name
3     42  {bbb}
2     23  {aaa, aab}

I know the solution is somewhere in the group_by() function. I can put the names into sets with data.groupby('ID')['name'].apply(set).reset_index() but how do I additionally count the occurrences of ID and add the numbers to the DataFrame correctly? I'm standing on the hose, as the German says. Thanks a lot!

Upvotes: 2

Views: 59

Answers (3)

Rajat Agarwal
Rajat Agarwal

Reputation: 184

You can use this:

df.groupby(['ID'])['name'].agg([set,'count']).rename(columns={'set':'name'}).reset_index()

Output:

   ID        name  count
0  23  {aaa, aab}      2
1  42       {bbb}      3


Upvotes: 1

sophocles
sophocles

Reputation: 13821

You can use groupBy.agg() as follows:

df.groupby('ID').agg({'ID':'count',
                      'name':set}).rename(columns={'ID':'ID_count'}).reset_index()

prints:

   ID  ID_count        name
0  23         2  {aaa, aab}
1  42         3       {bbb}

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

You can use .agg with multiple parameters:

x = df.groupby("ID", as_index=False).agg(
    count=("ID", "size"), name=("name", set)
)
print(x)

Prints:

   ID  count        name
0  23      2  {aaa, aab}
1  42      3       {bbb}

Upvotes: 1

Related Questions