Reputation: 45
thank you for your help.
I have data that looks like this:
city, room_type
A, X
A, Y
A, Z
B, X
B, Y
B, Y
I want my end result to look like this:
city, count(X), count(Y), count(z)
A, 1, 1, 1
B, 1, 2, 0
I am grouping by city and I want to show the count of each room_type in each city.
Any way to do this with python pandas? Thank you.
I learned SQL years ago and think that it may have been possible. I'm sure python can do the same. Thanks!
Upvotes: 3
Views: 6260
Reputation: 294258
A solution jezrael didn't give ;-)
s = pd.value_counts([tuple(i) for i in df.values.tolist()])
s.index = pd.MultiIndex.from_tuples(s.index.values, names=['city', None])
s.unstack(fill_value=0).rename(columns='count({})'.format).reset_index()
city count(X) count(Y) count(Z)
0 A 1 1 1
1 B 1 2 0
More involved
cities = pd.unique(df.city)
room_types = pd.unique(df.room_type)
d1 = pd.DataFrame(
np.zeros((len(cities), len(room_types)), dtype=int),
cities,
room_types
)
for r, c in df.values:
d1.set_value(r, c, d1.get_value(r, c) + 1)
d1.rename(columns='count({})'.format).rename_axis('city').reset_index()
Variation of first solution
from collections import Counter
pd.Series(
Counter(map(tuple, df.values.tolist()))
).unstack(fill_value=0).rename(
columns='count({})'.format
).rename_axis('city').reset_index()
Upvotes: 2
Reputation: 862611
You can use crosstab
with rename
columns:
df = pd.crosstab(df.city, df.room_type).rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type count(X) count(Y) count(Z)
city
A 1 1 1
B 1 2 0
Another solutions with groupby
and size
or value_counts
, for reshape is used unstack
:
df = df.groupby(['city', 'room_type']).size().unstack(fill_value=0)
.rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type count(X) count(Y) count(Z)
city
A 1 1 1
B 1 2 0
df = df.groupby('city')['room_type'].value_counts().unstack(fill_value=0)
.rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type count(X) count(Y) count(Z)
city
A 1 1 1
B 1 2 0
Upvotes: 5