Reputation: 2311
I have a dataframe with a multi-index that contains a level named datetime
which is a DatetimeIndex
. I want to group my data by the time of day. Is it idiomatic to do so via
df.groupby(df.index.get_level_values('datetime').time).something()
? I'm asking because I'm not a fan of the .get_level_values('datetime')
part of this. In particular: If datetime
was not part of the index, I could more easily write
df.groupby(df.datetime.dt.time).something()
It was my impression that the point of treating columns as indices was to make these kind of operations more straightforward, so I'm surprised the opposite is the case here.
EDIT:
I realized that if I could easily split my datetime
level into a date
level and a time
level, I could do
# change df.index to have levels [date, time, x, y, z] instead of [datetime, x, y, z]
df.groupby(level='time').something()
This looks pretty optimally succinct, so if the splitting can be done elegantly, that would also answer my question.
Upvotes: 0
Views: 905
Reputation: 862901
Is it idiomatic to do so via
df.groupby(df.index.get_level_values('datetime').time).median()
?
I think yes, if want attribute of MultiIndex
level, like here DatetimeIndex.time
with level name.
You can also use select level by position - here first level:
df.groupby(df.index.get_level_values(0).time).median()
Your edit solution should be simplify:
df.groupby(level='time').median()
df.groupby(level=1).median()
#some functions like sum, mean, median
df.median(level=1)
EDIT:
If possible, simpliest is split values to date and times and then create MultiIndex
:
df = pd.DataFrame({
'datetime': pd.to_datetime(['2000-01-05 15:00:00'] * 3 + ['2000-01-06'] * 3),
'x':[4,5,4,5,5,4],
'y':[7] * 6,
'z':[1,3] * 3,
'col':[5,3,6,9,2,4]
})
df['date'] = df['datetime'].dt.date
df['time'] = df['datetime'].dt.time
df = df.set_index(['date','time','x','y','z']).drop('datetime', axis=1)
print (df)
col
date time x y z
2000-01-05 15:00:00 4 7 1 5
5 7 3 3
4 7 1 6
2000-01-06 00:00:00 5 7 3 9
1 2
4 7 3 4
If input data has already datetimeindex
in MultiIndex
:
df1 = pd.DataFrame({
'datetime': pd.to_datetime(['2000-01-05 15:00:00'] * 3 + ['2000-01-06'] * 3),
'x':[4,5,4,5,5,4],
'y':[7] * 6,
'z':[1,3] * 3,
'col':[5,3,6,9,2,4]
}).set_index(['datetime','x','y','z'])
print (df1)
col
datetime x y z
2000-01-05 15:00:00 4 7 1 5
5 7 3 3
4 7 1 6
2000-01-06 00:00:00 5 7 3 9
1 2
4 7 3 4
names = ['date','time','x','y','z']
df1.index = pd.MultiIndex.from_tuples([(d.date(), d.time(),a,b,c)
for d,a,b,c in df1.index], names=names)
print (df1)
col
date time x y z
2000-01-05 15:00:00 4 7 1 5
5 7 3 3
4 7 1 6
2000-01-06 00:00:00 5 7 3 9
1 2
4 7 3 4
Upvotes: 1