Reputation: 1
I have the following question. I have a Python database containing clients of a hotel according to their country of origin (first two columns):
Hotel name Country Count
0 Hotel A US 2
1 Hotel A UK 1
2 Hotel A US 2
3 Hotel B UK 2
4 Hotel B UK 2
This means, in Hotel A 2 clientes were from US and 1 from UK and in Hotel B they had 2 clients from UK and no clients from US. What I want is the third column, this means, the number of observations by country of origin that a hotel has. Up to now the only way in which I managed to do this is using a for loop, but as I have 10 million observations I wonder if there's a more efficient/faster way.
Upvotes: 0
Views: 46
Reputation: 260640
You can groupby
Hotel/country and transform('count')
:
df['Count'] = df.groupby(['Hotel name', 'Country']).transform('count')
output:
Hotel name Country Count
0 Hotel A US 2
1 Hotel A UK 1
2 Hotel A US 2
3 Hotel B UK 2
4 Hotel B UK 2
Upvotes: 1