Pinti
Pinti

Reputation: 113

Pandas Series groupby specific hours

I am looking for efficient implementation of grouping a pandas Series on gas day (related to trading of natural gas). This includes all hours/timestamps between 6AM and 6AM of the next day in CET timezone. Because of daylight saving time, once a year a gas day has 23 hours and once 25 hours. My current solution works ok (see bellow, to_gas_day function), but is terribly slow. Any ideas are appreciated.

import pandas as pd

def to_gas_day(stamp):
    """Take a time stamp and return date according to gas day (from 6 to 6 CET)."""
    if stamp.hour < 6:
        day = stamp.date() - pd.Timedelta(days=1)
    else:
        day = stamp.date()
    return pd.to_datetime(day)

se = pd.Series(
    data = 1.,
    index=pd.date_range('2020-10-23','2020-10-27', freq='H', tz='CET')[:-1]
    )

# This is expected count of hours around DST date
se.groupby(to_gas_day).count()

Out[107]: 
2020-10-22     6
2020-10-23    24
2020-10-24    25
2020-10-25    24
2020-10-26    18
dtype: int64

Upvotes: 1

Views: 140

Answers (1)

Ferris
Ferris

Reputation: 5611

Is this equivalent to your code?

obj = pd.Series(pd.date_range('2020-10-23','2020-10-27', freq='H', tz='CET')[:-1])

cond = obj.dt.hour < 6
obj2 = np.where(cond, 
                  obj.dt.date - pd.Timedelta(days=1),
                  obj.dt.date)
obj2 = pd.Series(obj2)

obj3 = obj2.value_counts().sort_index()
print(obj3)

    2020-10-22     6
    2020-10-23    24
    2020-10-24    25
    2020-10-25    24
    2020-10-26    18
    dtype: int64

Upvotes: 2

Related Questions