Display multiple columns in Pandas Dataframe, but group by and count only one

Question

I have a dataframe similar to the following:

df:
facility,  location,  nickname
factory,   floor,     flr
office,    reception, rec
office,    execsuite, es
office,    cubicle,   cub
training,  conference,conf

My desired output is a grouped list with a count of "facility" and all values from the other columns displayed.

facility,  count,  location,   nickname
factory,   1,      floor,      flr
office,    3,      reception,  rec
                   execsuite,  es
                   cubicle,    cub
training,  1,      conference, conf

The following output would also work:

facility,  location,  nickname,  count
factory,   floor,     flr,       1
office,    reception, rec,       3
           execsuite, es,        3
           cubicle,   cub,       3
training,  conference,conf,      1

The problem I am having, is including other columns outside of the counted value and "group by" value without performing a merge. The following are some samples of things I have tried:

(1)  df['x'] = df.groupby('facility').apply(lambda x: df['facility'].count())

The 'x' column is returned as NAN on all values

(2)  df.groupby(['facility']).agg({'location': 'nunique'})

The count is returned correctly here, however, the columns 'location' and 'nickname' are not displayed. Attempting to add these to the groupby sets the count back to '1' for each by overaggregating.

(3)  df.groupby(['facility', 'location']).count()

Again, since I grouped by two columns, it is overaggregating - giving me counts of one everywhere.

(4) df.groupby('facility')['facility','location'].count()

Doesn't display the column text - only counts.

It appears what I need is one group by with the additional columns just displayed. I am not sure of the syntax to do this without a merge.

Dani Mesejo · Accepted Answer

You could groupby by facility and the set as index facility and count:

df['count'] = df.groupby('facility')['facility'].transform('size')
print(df.set_index(['facility', 'count']))

Output

                  location nickname
facility count                     
factory  1           floor      flr
office   3       reception      rec
         3       execsuite       es
         3         cubicle      cub
training 1      conference     conf

Display multiple columns in Pandas Dataframe, but group by and count only one

Answers (1)

Related Questions