Reputation: 5272
I don't understand how to group by considering only the time in a pandas DataFrame.
MWE:
import pandas as pd
df = (
pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
.assign(
date=lambda df: df.date.astype('datetime64').dt.time,
foo='bar',
)
.groupby(pd.Grouper(key='date', freq='10T'))
.agg('count')
)
# throws TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
df = (
pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
.assign(
date=lambda df: df.date.astype('datetime64'),
foo='bar',
)
.groupby(pd.Grouper(key='date', freq='10T'))
.agg('count')
)
works but does take the day into account.
Upvotes: 1
Views: 1322
Reputation: 863056
time
in pandas is python object, so not working with Grouper
. Possible solution is select times with split and convert to timedeltas by to_timedelta
or convert to to_datetime
- get same dates, because datetime
always contain date
:
df = (
pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
.assign(date=lambda df: pd.to_timedelta(df.date.str.split().str[1]))
.groupby(pd.Grouper(key='date', freq='10T'))
.size()
)
print (df)
date
12:10:00 2
Freq: 10T, dtype: int64
Or:
df = (
pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
.assign(date=lambda df: pd.to_datetime(df.date.str.split().str[1]))
.groupby(pd.Grouper(key='date', freq='10T'))
.size()
)
print (df)
date
2019-01-15 12:10:00 2
Freq: 10T, dtype: int64
If input column have datetime
dtype, use strftime
for convert to strings:
df = (
pd.DataFrame({'date': pd.to_datetime(['2018-01-01 12:10:00', '2018-01-02 12:11:00'])})
.assign(date=lambda df: pd.to_datetime(df.date.dt.strftime('%H:%M:%S')))
.groupby(pd.Grouper(key='date', freq='10T'))
.size()
)
print (df)
date
2019-01-15 12:10:00 2
Freq: 10T, dtype: int64
Or:
df = (
pd.DataFrame({'date': pd.to_datetime(['2018-01-01 12:10:00', '2018-01-02 12:11:00'])})
.assign(date=lambda df: pd.to_datetime(df.date.dt.time.astype(str)))
.groupby(pd.Grouper(key='date', freq='10T'))
.size()
)
print (df)
date
2019-01-15 12:10:00 2
Freq: 10T, dtype: int64
Upvotes: 3