Eleanor Abernathy
Eleanor Abernathy

Reputation: 1

Efficient way of counting observations in Pandas Dataframe

I have the following question. I have a Python database containing clients of a hotel according to their country of origin (first two columns):

  Hotel name Country  Count
0    Hotel A      US      2
1    Hotel A      UK      1
2    Hotel A      US      2
3    Hotel B      UK      2
4    Hotel B      UK      2

This means, in Hotel A 2 clientes were from US and 1 from UK and in Hotel B they had 2 clients from UK and no clients from US. What I want is the third column, this means, the number of observations by country of origin that a hotel has. Up to now the only way in which I managed to do this is using a for loop, but as I have 10 million observations I wonder if there's a more efficient/faster way.

Upvotes: 0

Views: 46

Answers (1)

mozway
mozway

Reputation: 260640

You can groupby Hotel/country and transform('count'):

df['Count'] = df.groupby(['Hotel name', 'Country']).transform('count')

output:

  Hotel name Country  Count
0    Hotel A      US      2
1    Hotel A      UK      1
2    Hotel A      US      2
3    Hotel B      UK      2
4    Hotel B      UK      2

Upvotes: 1

Related Questions