Ollie
Ollie

Reputation: 209

Adding columns pandas series, based on conditional time series

I have pulled some data from the internet which is basically 2 columns of hourly data for a whole year:

france.GetData(base_scenario, utils.enumerate_periods(start,end,'H','CET'))

output

2015-12-31 23:00:00+00:00    23.86
2016-01-01 00:00:00+00:00    22.39
2016-01-01 01:00:00+00:00    20.59
2016-01-01 02:00:00+00:00    16.81
2016-01-01 03:00:00+00:00    17.41
2016-01-01 04:00:00+00:00    17.02
2016-01-01 05:00:00+00:00    15.86...

I want to add two more columns basically 'peak' hour and an 'off peak' hour scaler columns. So if the times of the day are between 0800 and 1800 there will be a 1 in the peak column and if outside these hours there will be a 1 in the off peak column.

Could anyone please explain how to do this.

Many thanks

Upvotes: 2

Views: 2010

Answers (1)

jezrael
jezrael

Reputation: 862671

I think you can use to_datetime if not DatetimeIndex, then use between_time to column peak and tested for notnull - if NaN get False and if some value get True. Then boolean values are converted to int (False -> 0 and True -> 1) by astype and last from column peak get peak-off (thanks Quickbeam2k1):

df = pd.DataFrame({'col': {'2016-01-01 01:00:00+00:00': 20.59, '2016-01-01 07:00:00+00:00': 15.86, '2016-01-01 10:00:00+00:00': 15.86, '2016-01-01 09:00:00+00:00': 15.86, '2016-01-01 02:00:00+00:00': 16.81, '2016-01-01 03:00:00+00:00': 17.41, '2016-01-01 05:00:00+00:00': 15.86, '2016-01-01 04:00:00+00:00': 17.02, '2016-01-01 08:00:00+00:00': 15.86, '2015-12-31 23:00:00+00:00': 23.86, '2016-01-01 18:00:00+00:00': 15.86, '2016-01-01 06:00:00+00:00': 15.86, '2016-01-01 00:00:00+00:00': 22.39}})
print (df)
                             col
2015-12-31 23:00:00+00:00  23.86
2016-01-01 00:00:00+00:00  22.39
2016-01-01 01:00:00+00:00  20.59
2016-01-01 02:00:00+00:00  16.81
2016-01-01 03:00:00+00:00  17.41
2016-01-01 04:00:00+00:00  17.02
2016-01-01 05:00:00+00:00  15.86
2016-01-01 06:00:00+00:00  15.86
2016-01-01 07:00:00+00:00  15.86
2016-01-01 08:00:00+00:00  15.86
2016-01-01 09:00:00+00:00  15.86
2016-01-01 10:00:00+00:00  15.86
2016-01-01 18:00:00+00:00  15.86
print (df.index)
Index(['2015-12-31 23:00:00+00:00', '2016-01-01 00:00:00+00:00',
       '2016-01-01 01:00:00+00:00', '2016-01-01 02:00:00+00:00',
       '2016-01-01 03:00:00+00:00', '2016-01-01 04:00:00+00:00',
       '2016-01-01 05:00:00+00:00', '2016-01-01 06:00:00+00:00',
       '2016-01-01 07:00:00+00:00', '2016-01-01 08:00:00+00:00',
       '2016-01-01 09:00:00+00:00', '2016-01-01 10:00:00+00:00',
       '2016-01-01 18:00:00+00:00'],
      dtype='object')

df.index = pd.to_datetime(df.index) 

print (df.index)
DatetimeIndex(['2015-12-31 23:00:00', '2016-01-01 00:00:00',
               '2016-01-01 01:00:00', '2016-01-01 02:00:00',
               '2016-01-01 03:00:00', '2016-01-01 04:00:00',
               '2016-01-01 05:00:00', '2016-01-01 06:00:00',
               '2016-01-01 07:00:00', '2016-01-01 08:00:00',
               '2016-01-01 09:00:00', '2016-01-01 10:00:00',
               '2016-01-01 18:00:00'],
              dtype='datetime64[ns]', freq=None)
df['peak'] = df.between_time('08:00', '18:00')
df['peak'] = df['peak'].notnull().astype(int)
df['peak-off'] = -df['peak'] + 1
print (df)
                       col  peak  peak-off
2015-12-31 23:00:00  23.86     0         1
2016-01-01 00:00:00  22.39     0         1
2016-01-01 01:00:00  20.59     0         1
2016-01-01 02:00:00  16.81     0         1
2016-01-01 03:00:00  17.41     0         1
2016-01-01 04:00:00  17.02     0         1
2016-01-01 05:00:00  15.86     0         1
2016-01-01 06:00:00  15.86     0         1
2016-01-01 07:00:00  15.86     0         1
2016-01-01 08:00:00  15.86     1         0
2016-01-01 09:00:00  15.86     1         0
2016-01-01 10:00:00  15.86     1         0
2016-01-01 18:00:00  15.86     1         0

Another solution is if first get boolean mask by conditions and then convert it to int, for inverting mask use ~:

h1 = pd.datetime.strptime('08:00:00', '%H:%M:%S').time()
h2 = pd.datetime.strptime('18:00:00', '%H:%M:%S').time()
times = df.index.time
mask = (times >= h1) & (times <= h2)
df['peak'] = mask.astype(int)
df['peak-off'] = (~mask).astype(int)
print (df)
                       col  peak  peak-off
2015-12-31 23:00:00  23.86     0         1
2016-01-01 00:00:00  22.39     0         1
2016-01-01 01:00:00  20.59     0         1
2016-01-01 02:00:00  16.81     0         1
2016-01-01 03:00:00  17.41     0         1
2016-01-01 04:00:00  17.02     0         1
2016-01-01 05:00:00  15.86     0         1
2016-01-01 06:00:00  15.86     0         1
2016-01-01 07:00:00  15.86     0         1
2016-01-01 08:00:00  15.86     1         0
2016-01-01 09:00:00  15.86     1         0
2016-01-01 10:00:00  15.86     1         0
2016-01-01 18:00:00  15.86     1         0

If only hour data solution can be more simple - use DatetimeIndex.hour for mask:

df.index = pd.to_datetime(df.index) 
print (df.index)
h = df.index.hour
mask = (h >= 8) & (h <= 18)
df['peak'] = mask.astype(int)
df['peak-off'] = (~mask).astype(int)
print (df)
                       col  peak  peak-off
2015-12-31 23:00:00  23.86     0         1
2016-01-01 00:00:00  22.39     0         1
2016-01-01 01:00:00  20.59     0         1
2016-01-01 02:00:00  16.81     0         1
2016-01-01 03:00:00  17.41     0         1
2016-01-01 04:00:00  17.02     0         1
2016-01-01 05:00:00  15.86     0         1
2016-01-01 06:00:00  15.86     0         1
2016-01-01 07:00:00  15.86     0         1
2016-01-01 08:00:00  15.86     1         0
2016-01-01 09:00:00  15.86     1         0
2016-01-01 10:00:00  15.86     1         0
2016-01-01 18:00:00  15.86     1         0

Upvotes: 2

Related Questions