user299791
user299791

Reputation: 2070

how to groupby hour in a pandas multiindex

I have a pandas multiindex with two indices, a data and a gender columns. It looks like this:

                  Division  North  South  West  East

               Date Gender     
2016-05-16 19:00:00      F      0      2     3     3
                         M     12     15    12    12
2016-05-16 20:00:00      F     12      9    11    11
                         M     10     13     8     9
2016-05-16 21:00:00      F      9      4     7     1
                         M      5      1    12    10

Now if I want to find the average values for each hour, I know I can do like:

df.groupby(df.index.hour).mean()

but this does not seem to work when you have a multi index. I found that I could do reach the Date index like:

df.groupby(df.index.get_level_values('Date').hour).mean()

which sort of averages over the 24 hours in a day, but I loose track of the Gender index...

so my question is: how can I find the average hourly values for each Division by Gender?

Upvotes: 3

Views: 1130

Answers (1)

jezrael
jezrael

Reputation: 862851

I think you can add level of MultiIndex, need pandas 0.20.1+:

df1 = df.groupby([df.index.get_level_values('Date').hour,'Gender']).mean()
print (df1)
             North  South  West  East
Date Gender                          
19   F           0      2     3     3
     M          12     15    12    12
20   F          12      9    11    11
     M          10     13     8     9
21   F           9      4     7     1
     M           5      1    12    10

Another solution:

df1 = df.groupby([df.index.get_level_values('Date').hour,
                  df.index.get_level_values('Gender')]).mean()
print (df1)
             North  South  West  East
Date Gender                          
19   F           0      2     3     3
     M          12     15    12    12
20   F          12      9    11    11
     M          10     13     8     9
21   F           9      4     7     1
     M           5      1    12    10

Or simply create columns from MultiIndex:

df = df.reset_index()
df1 = df.groupby([df['Date'].dt.hour, 'Gender']).mean()
print (df1)
             North  South  West  East
Date Gender                          
19   F           0      2     3     3
     M          12     15    12    12
20   F          12      9    11    11
     M          10     13     8     9
21   F           9      4     7     1
     M           5      1    12    10

Upvotes: 4

Related Questions