Reputation: 1439
given a dataframe that logs uses of some books like this:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
Name Type ID Count
Book1 ebook 1 2
Book2 paper 2 2
Book3 paper 3 1
How can this be done?
Thanks!
Upvotes: 128
Views: 237420
Reputation: 2014
If you have many columns in a df it makes sense to use df.groupby(['ID']).agg(Count=('ID', 'count'),...)
, see here. The .agg()
function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them (or more precisely to keep the first entries in them), use .agg(Count=('ID', 'count'), col1=('col1', 'first'), col2=('col2', 'first'),...)
. Instead of 'first'
, you can also apply 'sum'
, 'mean'
and others.
Upvotes: 35
Reputation: 521
SIMPLEST WAY
df.groupby(['col1', 'col1'], as_index=False).count()
. Use as_index=False
to retain column names. The default is True
.
Also you can use df.groupby(['col_1', 'col_2']).count().reset_index()
Upvotes: 4
Reputation: 8768
You can use value_counts()
as well:
df.value_counts().reset_index(name= 'Count')
Output:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
Upvotes: 1
Reputation: 3701
I think as_index=False
should do the trick.
df.groupby(['Name','Type','ID'], as_index=False).count()
Upvotes: 126
Reputation: 393893
You want the following:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby
on these, call count
and then reset_index
.
An alternative approach would be to add the 'Count' column using transform
and then call drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
Upvotes: 138