Reputation: 919
I have a large DataFrame looking like this
name Country ...
1 Paul Germany
2 Paul Germany
3 George Italy
3 George Italy
3 George Italy
...
N John USA
I'm looking for the occurence of each element of the name column, such has
name Country Count
1 Paul Germany 2000
2 George Italy 500
...
N John USA 40000
Any idea what is the most optimal way to do it ?
Because this is quite long
df['count'] = df.groupby(['name'])['name'].transform(pd.Series.value_counts)
Upvotes: 1
Views: 50
Reputation: 648
If you just want to the counts with respect to the name
column, you don't need to use groupby
, you can just use select the name
column from the DataFrame (which returns a Series
object) and call value_counts()
on it directly:
df['name'].value_counts()
Upvotes: 0
Reputation: 14001
you can do it like this:
df.groupby(['name', 'Country']).size()
example:
import pandas as pd
df = pd.DataFrame.from_dict({'name' : ['paul', 'paul', 'George', 'George', 'George'],
'Country': ['Germany', 'Italy','Germany','Italy','Italy']})
df
output:
Country name
0 Germany paul
1 Italy paul
2 Germany George
3 Italy George
4 Italy George
Group by and get count:
df.groupby(['name', 'Country']).size()
output:
name Country
George Germany 1
Italy 2
paul Germany 1
Italy 1
Upvotes: 1