Reputation: 209
I have pulled some data from the internet which is basically 2 columns of hourly data for a whole year:
france.GetData(base_scenario, utils.enumerate_periods(start,end,'H','CET'))
output
2015-12-31 23:00:00+00:00 23.86
2016-01-01 00:00:00+00:00 22.39
2016-01-01 01:00:00+00:00 20.59
2016-01-01 02:00:00+00:00 16.81
2016-01-01 03:00:00+00:00 17.41
2016-01-01 04:00:00+00:00 17.02
2016-01-01 05:00:00+00:00 15.86...
I want to add two more columns basically 'peak' hour and an 'off peak' hour scaler columns. So if the times of the day are between 0800 and 1800 there will be a 1 in the peak column and if outside these hours there will be a 1 in the off peak column.
Could anyone please explain how to do this.
Many thanks
Upvotes: 2
Views: 2010
Reputation: 862671
I think you can use to_datetime
if not DatetimeIndex
, then use between_time
to column peak
and tested for notnull
- if NaN
get False
and if some value get True
. Then boolean values are converted to int
(False
-> 0
and True
-> 1
) by astype
and last from column peak
get peak-off
(thanks Quickbeam2k1):
df = pd.DataFrame({'col': {'2016-01-01 01:00:00+00:00': 20.59, '2016-01-01 07:00:00+00:00': 15.86, '2016-01-01 10:00:00+00:00': 15.86, '2016-01-01 09:00:00+00:00': 15.86, '2016-01-01 02:00:00+00:00': 16.81, '2016-01-01 03:00:00+00:00': 17.41, '2016-01-01 05:00:00+00:00': 15.86, '2016-01-01 04:00:00+00:00': 17.02, '2016-01-01 08:00:00+00:00': 15.86, '2015-12-31 23:00:00+00:00': 23.86, '2016-01-01 18:00:00+00:00': 15.86, '2016-01-01 06:00:00+00:00': 15.86, '2016-01-01 00:00:00+00:00': 22.39}})
print (df)
col
2015-12-31 23:00:00+00:00 23.86
2016-01-01 00:00:00+00:00 22.39
2016-01-01 01:00:00+00:00 20.59
2016-01-01 02:00:00+00:00 16.81
2016-01-01 03:00:00+00:00 17.41
2016-01-01 04:00:00+00:00 17.02
2016-01-01 05:00:00+00:00 15.86
2016-01-01 06:00:00+00:00 15.86
2016-01-01 07:00:00+00:00 15.86
2016-01-01 08:00:00+00:00 15.86
2016-01-01 09:00:00+00:00 15.86
2016-01-01 10:00:00+00:00 15.86
2016-01-01 18:00:00+00:00 15.86
print (df.index)
Index(['2015-12-31 23:00:00+00:00', '2016-01-01 00:00:00+00:00',
'2016-01-01 01:00:00+00:00', '2016-01-01 02:00:00+00:00',
'2016-01-01 03:00:00+00:00', '2016-01-01 04:00:00+00:00',
'2016-01-01 05:00:00+00:00', '2016-01-01 06:00:00+00:00',
'2016-01-01 07:00:00+00:00', '2016-01-01 08:00:00+00:00',
'2016-01-01 09:00:00+00:00', '2016-01-01 10:00:00+00:00',
'2016-01-01 18:00:00+00:00'],
dtype='object')
df.index = pd.to_datetime(df.index)
print (df.index)
DatetimeIndex(['2015-12-31 23:00:00', '2016-01-01 00:00:00',
'2016-01-01 01:00:00', '2016-01-01 02:00:00',
'2016-01-01 03:00:00', '2016-01-01 04:00:00',
'2016-01-01 05:00:00', '2016-01-01 06:00:00',
'2016-01-01 07:00:00', '2016-01-01 08:00:00',
'2016-01-01 09:00:00', '2016-01-01 10:00:00',
'2016-01-01 18:00:00'],
dtype='datetime64[ns]', freq=None)
df['peak'] = df.between_time('08:00', '18:00')
df['peak'] = df['peak'].notnull().astype(int)
df['peak-off'] = -df['peak'] + 1
print (df)
col peak peak-off
2015-12-31 23:00:00 23.86 0 1
2016-01-01 00:00:00 22.39 0 1
2016-01-01 01:00:00 20.59 0 1
2016-01-01 02:00:00 16.81 0 1
2016-01-01 03:00:00 17.41 0 1
2016-01-01 04:00:00 17.02 0 1
2016-01-01 05:00:00 15.86 0 1
2016-01-01 06:00:00 15.86 0 1
2016-01-01 07:00:00 15.86 0 1
2016-01-01 08:00:00 15.86 1 0
2016-01-01 09:00:00 15.86 1 0
2016-01-01 10:00:00 15.86 1 0
2016-01-01 18:00:00 15.86 1 0
Another solution is if first get boolean
mask by conditions and then convert it to int
, for inverting mask use ~
:
h1 = pd.datetime.strptime('08:00:00', '%H:%M:%S').time()
h2 = pd.datetime.strptime('18:00:00', '%H:%M:%S').time()
times = df.index.time
mask = (times >= h1) & (times <= h2)
df['peak'] = mask.astype(int)
df['peak-off'] = (~mask).astype(int)
print (df)
col peak peak-off
2015-12-31 23:00:00 23.86 0 1
2016-01-01 00:00:00 22.39 0 1
2016-01-01 01:00:00 20.59 0 1
2016-01-01 02:00:00 16.81 0 1
2016-01-01 03:00:00 17.41 0 1
2016-01-01 04:00:00 17.02 0 1
2016-01-01 05:00:00 15.86 0 1
2016-01-01 06:00:00 15.86 0 1
2016-01-01 07:00:00 15.86 0 1
2016-01-01 08:00:00 15.86 1 0
2016-01-01 09:00:00 15.86 1 0
2016-01-01 10:00:00 15.86 1 0
2016-01-01 18:00:00 15.86 1 0
If only hour data solution can be more simple - use DatetimeIndex.hour
for mask:
df.index = pd.to_datetime(df.index)
print (df.index)
h = df.index.hour
mask = (h >= 8) & (h <= 18)
df['peak'] = mask.astype(int)
df['peak-off'] = (~mask).astype(int)
print (df)
col peak peak-off
2015-12-31 23:00:00 23.86 0 1
2016-01-01 00:00:00 22.39 0 1
2016-01-01 01:00:00 20.59 0 1
2016-01-01 02:00:00 16.81 0 1
2016-01-01 03:00:00 17.41 0 1
2016-01-01 04:00:00 17.02 0 1
2016-01-01 05:00:00 15.86 0 1
2016-01-01 06:00:00 15.86 0 1
2016-01-01 07:00:00 15.86 0 1
2016-01-01 08:00:00 15.86 1 0
2016-01-01 09:00:00 15.86 1 0
2016-01-01 10:00:00 15.86 1 0
2016-01-01 18:00:00 15.86 1 0
Upvotes: 2