GNMO11
GNMO11

Reputation: 2259

Pandas create new column with count from groupby

I have a df that looks like the following:

id        item        color
01        truck       red
02        truck       red
03        car         black
04        truck       blue
05        car         black

I am trying to create a df that looks like this:

item      color       count
truck     red          2
truck     blue         1
car       black        2

I have tried

df["count"] = df.groupby("item")["color"].transform('count')

But it is not quite what I am searching for.

Any guidance is appreciated

Upvotes: 87

Views: 205184

Answers (5)

rachwa
rachwa

Reputation: 2300

You can use value_counts and name the column with reset_index:

In [3]: df[['item', 'color']].value_counts().reset_index(name='counts')
Out[3]: 
    item  color  counts
0    car  black       2
1  truck    red       2
2  truck   blue       1

Upvotes: 9

Cannon Lock
Cannon Lock

Reputation: 21

An option that is more literal then the accepted answer.

df.groupby(["item", "color"], as_index=False).agg(count=("item", "count"))

Any column name can be used in place of "item" in the aggregation.

"as_index=False" prevents the grouped column from becoming the index.

Upvotes: 2

Adrian Keister
Adrian Keister

Reputation: 1025

Here is another option:

import numpy as np
df['Counts'] = np.zeros(len(df))
grp_df = df.groupby(['item', 'color']).count()

which results in

             Counts
item  color        
car   black       2
truck blue        1
      red         2

Upvotes: 5

Jadon Manilall
Jadon Manilall

Reputation: 637

Another possible way to achieve the desired output would be to use Named Aggregation. Which will allow you to specify the name and respective aggregation function for the desired output columns.

Named aggregation

(New in version 0.25.0.)

To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where:

  • The keywords are the output column names

  • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg named tuple with the fields ['column','aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

So to get the desired output - you could try something like...

import pandas as pd
# Setup
df = pd.DataFrame([
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"car",
        "color":"black"
    },
    {
        "item":"truck",
        "color":"blue"
    },
    {
        "item":"car",
        "color":"black"
    }
])

df_grouped = df.groupby(["item", "color"]).agg(
    count_col=pd.NamedAgg(column="color", aggfunc="count")
)
print(df_grouped)

Which produces the following output:

             count_col
item  color
car   black          2
truck blue           1
      red            2

Upvotes: 34

Andy Hayden
Andy Hayden

Reputation: 375475

That's not a new column, that's a new DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
             id
item  color
car   black   2
truck blue    1
      red     2

To get the result you want is to use reset_index:

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
    item  color  count
0    car  black      2
1  truck   blue      1
2  truck    red      2

To get a "new column" you could use transform:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0    2
1    2
2    2
3    1
4    2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

Upvotes: 150

Related Questions