Getting the maximum mode per group using groupby

Question

I have generated a table that shows the mode values of my dataset. The dataset was originally grouped by "date", "hour" and "room" in order to be able to get the mode value of foot traffic. The groupby was performed as follows:

dataframe = df.groupby([df['date'], df['hour'], df['room']])

Then I generated the mode value(s) for "traffic" of each groups the following way:

dataframe = dataframe['traffic'].apply(lambda x: x.mode())

As a result I have my dataframe which displays the proper groups and shows the modal value per room, per hour and per day. My issue is that in certain cases the number of modal values is more than one (as 2 or 3 values have had the same number of observations)

The current dataframe looks like this:

                            mode
date    hour    room        
6       12      room1   0   15  
                room2   0   23   
                        1   26  
                room3   0   1  
                        1   2
        13      room2   0   9
                        1   11
                room2   0   15

As you can see above, for room2 at 12:00 on January 6 there are 2 modal values (23 and 26).
My issue here is that ideally I would drop the lowest "mode" value(s) from each group where there are more than 1 observation.

I have looked at several approaches but cannot get this to function. I was thinking that the following would work:

dataframe.apply(lambda x: x[dataframe['mode'] == dataframe['mode'].max()])

I would then remove duplicates, but this does not affect the dataframe..

or

dataframe.filter(lambda x : x[dataframe['mode'] == dataframe['mode'].max()], dataframe['mode'])

which gives me a "'function' object is not iterable" error
or

for elem in range(0, dataframe.size -1):      #to iterate over the daaframe rows
    if elem != dataframe['mode'].max():       #to identify rows that aren't max mode value
        dataframe = dataframe.drop([elem])    #to drop these rows

To answer the request from Conner, please see below the original csv data (dataframe called "df"):

        room    time                 capacity   areaName    hour    date    traffic 
    0   room1   Mon Nov 02 09:00:00  40         area01      9       2       14
    1   room1   Mon Nov 02 09:05:00  40         area01      9       2       15
    2   room1   Mon Nov 02 09:10:00  80         area01      9       2       23  
    3   room1   Mon Nov 02 09:15:00  80         area01      9       2       23  
...
    14  room2   Mon Nov 02 11:00:00  40         area03      11      2       67  
    15  room2   Mon Nov 02 11:50:00  80         area03      11      2       64
    16  room2   Mon Nov 02 11:10:00  40         area03      11      2       72

If anyone knew a way to go through each group and only keep the max mode value if there are several ones I would greatly appreciate.

Thank you for your time!

-Romain

Getting the maximum mode per group using groupby

Answers (1)

Related Questions