Bananach
Bananach

Reputation: 2311

Pandas group by time of day from Datetime multi-index level

I have a dataframe with a multi-index that contains a level named datetime which is a DatetimeIndex. I want to group my data by the time of day. Is it idiomatic to do so via

df.groupby(df.index.get_level_values('datetime').time).something()

? I'm asking because I'm not a fan of the .get_level_values('datetime') part of this. In particular: If datetime was not part of the index, I could more easily write

df.groupby(df.datetime.dt.time).something()

It was my impression that the point of treating columns as indices was to make these kind of operations more straightforward, so I'm surprised the opposite is the case here.

EDIT: I realized that if I could easily split my datetime level into a date level and a time level, I could do

# change df.index to have levels [date, time, x, y, z] instead of [datetime, x, y, z]
df.groupby(level='time').something()

This looks pretty optimally succinct, so if the splitting can be done elegantly, that would also answer my question.

Upvotes: 0

Views: 905

Answers (1)

jezrael
jezrael

Reputation: 862901

Is it idiomatic to do so via

df.groupby(df.index.get_level_values('datetime').time).median()

?

I think yes, if want attribute of MultiIndex level, like here DatetimeIndex.time with level name.

You can also use select level by position - here first level:

df.groupby(df.index.get_level_values(0).time).median()

Your edit solution should be simplify:

df.groupby(level='time').median()
df.groupby(level=1).median()
#some functions like sum, mean, median
df.median(level=1)

EDIT:

If possible, simpliest is split values to date and times and then create MultiIndex:

df = pd.DataFrame({
        'datetime': pd.to_datetime(['2000-01-05 15:00:00'] * 3 + ['2000-01-06'] * 3),
         'x':[4,5,4,5,5,4],
         'y':[7] * 6,
         'z':[1,3] * 3,
         'col':[5,3,6,9,2,4]
})

df['date'] = df['datetime'].dt.date
df['time'] = df['datetime'].dt.time

df = df.set_index(['date','time','x','y','z']).drop('datetime', axis=1)
print (df)
                           col
date       time     x y z     
2000-01-05 15:00:00 4 7 1    5
                    5 7 3    3
                    4 7 1    6
2000-01-06 00:00:00 5 7 3    9
                        1    2
                    4 7 3    4
                    

If input data has already datetimeindex in MultiIndex:

df1 = pd.DataFrame({
        'datetime': pd.to_datetime(['2000-01-05 15:00:00'] * 3 + ['2000-01-06'] * 3),
         'x':[4,5,4,5,5,4],
         'y':[7] * 6,
         'z':[1,3] * 3,
         'col':[5,3,6,9,2,4]
}).set_index(['datetime','x','y','z'])

print (df1)
                           col
datetime            x y z     
2000-01-05 15:00:00 4 7 1    5
                    5 7 3    3
                    4 7 1    6
2000-01-06 00:00:00 5 7 3    9
                        1    2
                    4 7 3    4

names = ['date','time','x','y','z']
df1.index = pd.MultiIndex.from_tuples([(d.date(), d.time(),a,b,c) 
                                       for d,a,b,c in df1.index], names=names)
print (df1)
                           col
date       time     x y z     
2000-01-05 15:00:00 4 7 1    5
                    5 7 3    3
                    4 7 1    6
2000-01-06 00:00:00 5 7 3    9
                        1    2
                    4 7 3    4

Upvotes: 1

Related Questions