ClementWalter
ClementWalter

Reputation: 5272

Cannot groupby pd.Grouper on datetime.time object

I don't understand how to group by considering only the time in a pandas DataFrame.

MWE:

import pandas as pd

df = (
    pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
    .assign(
        date=lambda df: df.date.astype('datetime64').dt.time,
        foo='bar',
    )
    .groupby(pd.Grouper(key='date', freq='10T'))
    .agg('count')
)
# throws TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

df = (
    pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
    .assign(
        date=lambda df: df.date.astype('datetime64'),
        foo='bar',
    )
    .groupby(pd.Grouper(key='date', freq='10T'))
    .agg('count')
)

works but does take the day into account.

Upvotes: 1

Views: 1322

Answers (1)

jezrael
jezrael

Reputation: 863056

time in pandas is python object, so not working with Grouper. Possible solution is select times with split and convert to timedeltas by to_timedelta or convert to to_datetime - get same dates, because datetime always contain date:

df = (
    pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
    .assign(date=lambda df: pd.to_timedelta(df.date.str.split().str[1]))
    .groupby(pd.Grouper(key='date', freq='10T'))
    .size()
)
print (df)

date
12:10:00    2
Freq: 10T, dtype: int64

Or:

df = (
    pd.DataFrame({'date': ['2018-01-01 12:10:00', '2018-01-02 12:11:00']})
    .assign(date=lambda df: pd.to_datetime(df.date.str.split().str[1]))
    .groupby(pd.Grouper(key='date', freq='10T'))
    .size()
)
print (df)

date
2019-01-15 12:10:00    2
Freq: 10T, dtype: int64

If input column have datetime dtype, use strftime for convert to strings:

df = (
    pd.DataFrame({'date': pd.to_datetime(['2018-01-01 12:10:00', '2018-01-02 12:11:00'])})
    .assign(date=lambda df: pd.to_datetime(df.date.dt.strftime('%H:%M:%S')))
    .groupby(pd.Grouper(key='date', freq='10T'))
    .size()
)
print (df)
date
2019-01-15 12:10:00    2
Freq: 10T, dtype: int64

Or:

df = (
    pd.DataFrame({'date': pd.to_datetime(['2018-01-01 12:10:00', '2018-01-02 12:11:00'])})
    .assign(date=lambda df: pd.to_datetime(df.date.dt.time.astype(str)))
    .groupby(pd.Grouper(key='date', freq='10T'))
    .size()
)
print (df)
date
2019-01-15 12:10:00    2
Freq: 10T, dtype: int64

Upvotes: 3

Related Questions