Adrian Ribao
Adrian Ribao

Reputation: 1439

How to GroupBy a Dataframe in Pandas and keep Columns

given a dataframe that logs uses of some books like this:

Name   Type   ID
Book1  ebook  1
Book2  paper  2
Book3  paper  3
Book1  ebook  1
Book2  paper  2

I need to get the count of all the books, keeping the other columns and get this:

Name   Type   ID    Count
Book1  ebook  1     2
Book2  paper  2     2
Book3  paper  3     1

How can this be done?

Thanks!

Upvotes: 128

Views: 237420

Answers (5)

NeStack
NeStack

Reputation: 2014

If you have many columns in a df it makes sense to use df.groupby(['ID']).agg(Count=('ID', 'count'),...), see here. The .agg() function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them (or more precisely to keep the first entries in them), use .agg(Count=('ID', 'count'), col1=('col1', 'first'), col2=('col2', 'first'),...). Instead of 'first', you can also apply 'sum', 'mean' and others.

Upvotes: 35

Somyadeep Shrivastava
Somyadeep Shrivastava

Reputation: 521

SIMPLEST WAY

df.groupby(['col1', 'col1'], as_index=False).count(). Use as_index=False to retain column names. The default is True.

Also you can use df.groupby(['col_1', 'col_2']).count().reset_index()

Upvotes: 4

rhug123
rhug123

Reputation: 8768

You can use value_counts() as well:

df.value_counts().reset_index(name= 'Count')

Output:

    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

Upvotes: 1

jpobst
jpobst

Reputation: 3701

I think as_index=False should do the trick.

df.groupby(['Name','Type','ID'], as_index=False).count()

Upvotes: 126

EdChum
EdChum

Reputation: 393893

You want the following:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()

Out[20]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()

Out[25]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

Upvotes: 138

Related Questions