Kelvin Ng
Kelvin Ng

Reputation: 45

Python Pandas groupby multiple columns

thank you for your help.

I have data that looks like this:

city,  room_type
A, X
A, Y
A, Z
B, X
B, Y
B, Y

I want my end result to look like this:

city, count(X), count(Y), count(z) 
A,  1, 1, 1
B,  1, 2, 0

I am grouping by city and I want to show the count of each room_type in each city.

Any way to do this with python pandas? Thank you.

I learned SQL years ago and think that it may have been possible. I'm sure python can do the same. Thanks!

Upvotes: 3

Views: 6260

Answers (2)

piRSquared
piRSquared

Reputation: 294258

A solution jezrael didn't give ;-)

s = pd.value_counts([tuple(i) for i in df.values.tolist()])
s.index = pd.MultiIndex.from_tuples(s.index.values, names=['city', None])
s.unstack(fill_value=0).rename(columns='count({})'.format).reset_index()

  city  count(X)  count(Y)  count(Z)
0    A         1         1         1
1    B         1         2         0

More involved

cities = pd.unique(df.city)
room_types = pd.unique(df.room_type)
d1 = pd.DataFrame(
    np.zeros((len(cities), len(room_types)), dtype=int),
    cities,
    room_types
)
for r, c in df.values:
    d1.set_value(r, c, d1.get_value(r, c) + 1)

d1.rename(columns='count({})'.format).rename_axis('city').reset_index()

Variation of first solution

from collections import Counter

pd.Series(
    Counter(map(tuple, df.values.tolist()))
).unstack(fill_value=0).rename(
    columns='count({})'.format
).rename_axis('city').reset_index()

Upvotes: 2

jezrael
jezrael

Reputation: 862611

You can use crosstab with rename columns:

df = pd.crosstab(df.city, df.room_type).rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type  count(X)  count(Y)  count(Z)
city                                   
A                 1         1         1
B                 1         2         0

Another solutions with groupby and size or value_counts, for reshape is used unstack:

df = df.groupby(['city', 'room_type']).size().unstack(fill_value=0)
       .rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type  count(X)  count(Y)  count(Z)
city                                   
A                 1         1         1
B                 1         2         0

df = df.groupby('city')['room_type'].value_counts().unstack(fill_value=0)
       .rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type  count(X)  count(Y)  count(Z)
city                                   
A                 1         1         1
B                 1         2         0

Upvotes: 5

Related Questions