Reputation: 113
I am looking for efficient implementation of grouping a pandas Series on gas day (related to trading of natural gas).
This includes all hours/timestamps between 6AM and 6AM of the next day in CET timezone.
Because of daylight saving time, once a year a gas day has 23 hours and once 25 hours.
My current solution works ok (see bellow, to_gas_day
function), but is terribly slow. Any ideas are appreciated.
import pandas as pd
def to_gas_day(stamp):
"""Take a time stamp and return date according to gas day (from 6 to 6 CET)."""
if stamp.hour < 6:
day = stamp.date() - pd.Timedelta(days=1)
else:
day = stamp.date()
return pd.to_datetime(day)
se = pd.Series(
data = 1.,
index=pd.date_range('2020-10-23','2020-10-27', freq='H', tz='CET')[:-1]
)
# This is expected count of hours around DST date
se.groupby(to_gas_day).count()
Out[107]:
2020-10-22 6
2020-10-23 24
2020-10-24 25
2020-10-25 24
2020-10-26 18
dtype: int64
Upvotes: 1
Views: 140
Reputation: 5611
Is this equivalent to your code?
obj = pd.Series(pd.date_range('2020-10-23','2020-10-27', freq='H', tz='CET')[:-1])
cond = obj.dt.hour < 6
obj2 = np.where(cond,
obj.dt.date - pd.Timedelta(days=1),
obj.dt.date)
obj2 = pd.Series(obj2)
obj3 = obj2.value_counts().sort_index()
print(obj3)
2020-10-22 6
2020-10-23 24
2020-10-24 25
2020-10-25 24
2020-10-26 18
dtype: int64
Upvotes: 2