Reputation: 311
I have a dataframe similar to the following:
df:
facility, location, nickname
factory, floor, flr
office, reception, rec
office, execsuite, es
office, cubicle, cub
training, conference,conf
My desired output is a grouped list with a count of "facility" and all values from the other columns displayed.
facility, count, location, nickname
factory, 1, floor, flr
office, 3, reception, rec
execsuite, es
cubicle, cub
training, 1, conference, conf
The following output would also work:
facility, location, nickname, count
factory, floor, flr, 1
office, reception, rec, 3
execsuite, es, 3
cubicle, cub, 3
training, conference,conf, 1
The problem I am having, is including other columns outside of the counted value and "group by" value without performing a merge. The following are some samples of things I have tried:
(1) df['x'] = df.groupby('facility').apply(lambda x: df['facility'].count())
The 'x' column is returned as NAN on all values
(2) df.groupby(['facility']).agg({'location': 'nunique'})
The count is returned correctly here, however, the columns 'location' and 'nickname' are not displayed. Attempting to add these to the groupby sets the count back to '1' for each by overaggregating.
(3) df.groupby(['facility', 'location']).count()
Again, since I grouped by two columns, it is overaggregating - giving me counts of one everywhere.
(4) df.groupby('facility')['facility','location'].count()
Doesn't display the column text - only counts.
It appears what I need is one group by with the additional columns just displayed. I am not sure of the syntax to do this without a merge.
Upvotes: 1
Views: 2078
Reputation: 61920
You could groupby by facility
and the set as index facility
and count
:
df['count'] = df.groupby('facility')['facility'].transform('size')
print(df.set_index(['facility', 'count']))
Output
location nickname
facility count
factory 1 floor flr
office 3 reception rec
3 execsuite es
3 cubicle cub
training 1 conference conf
Upvotes: 2