Reputation: 2253
I have 2 dfs:
threshold
id meter_point_id valid_from valid_until max_power_contractual
0 3 61 1969-12-31 23:00:00 2019-11-06 23:00:00 0
1 79 61 2019-11-07 00:00:00 2020-07-13 00:00:00 10
data
id ds time_series_id y
0 12858 2019-03-21 14:30:00 12858 49.25
1 12858 2019-03-21 14:15:00 12858 52.5
2 12858 2019-03-21 17:15:00 12858 49.25
3 12858 2019-03-21 13:45:00 12858 49.0
4 12858 2019-03-21 13:30:00 12858 51.75
5 22231 2019-11-11 12:00:00 22231 10.5
6 22231 2019-11-11 11:45:00 22231 10.0
7 22231 2019-11-12 09:45:00 22231 10.0
8 22231 2019-11-14 21:45:00 22231 9.5
I want to check every date in ds
column in the data
df against the valid_from
and valid_until
dates in the threshold
columns. If a date in the ds
lies in the date range 1969-12-31 23:00:00
-2019-11-06 23:00:00
, I want to obtain the max_power_contractual
value from the threshold
df and create it as a new column in the data
df.
Expected output
id ds time_series_id y max_power_contractual
0 12858 2019-03-21 14:30:00 12858 49.25 0
1 12858 2019-03-21 14:15:00 12858 52.5 0
2 12858 2019-03-21 17:15:00 12858 49.25 0
3 12858 2019-03-21 13:45:00 12858 49.0 0
4 12858 2019-03-21 13:30:00 12858 51.75 0
5 22231 2019-11-11 12:00:00 22231 10.5 10
6 22231 2019-11-11 11:45:00 22231 10.0 10
7 22231 2019-11-12 09:45:00 22231 10.0 10
8 22231 2019-11-14 21:45:00 22231 9.5 10
In the first 5 rows of the data
df, the ds
values match with the date range in the first row of threshold
df, so since the max_power_contractual
for that date range is 0, I want that value in the data
df. Similarly, the last rows 4 of the data
df, the ds
values match with the date range in the second row of threshold
df, so since the max_power_contractual
for that date range is 10, I want that value in the data
df.
How can this be done?
Thanks
Upvotes: 3
Views: 253
Reputation: 71689
First convert the date like columns in both the dataframes to pandas datetime
series using pd.to_datetime
:
threshold['valid_from'] = pd.to_datetime(threshold['valid_from'])
threshold['valid_until'] = pd.to_datetime(threshold['valid_until'])
data['ds'] = pd.to_datetime(data['ds'])
Then use:
idx = pd.IntervalIndex.from_arrays(threshold['valid_from'], threshold['valid_until'])
mapping = threshold.set_index(idx)['max_power_contractual']
data['max_power_contractual'] = data['ds'].map(mapping)
Details:
STEP A: Create a pd.IntervalIndex
from the valid_from
and valid_until
columns, this IntervalIndex will be used in STEP C
for mapping the column ds
:
# print(idx)
IntervalIndex([(1969-12-31 23:00:00, 2019-11-06 23:00:00], (2019-11-07, 2020-07-13]],
closed='right',
dtype='interval[datetime64[ns]]')
STEP B: Use .set_index
to set the index of max_power_contractual
series to this interval index idx
:
# print(mapping)
(1969-12-31 23:00:00, 2019-11-06 23:00:00] 0
(2019-11-07, 2020-07-13] 10
Name: max_power_contractual, dtype: int64
STEP C: Finally use Series.map
to map the column ds
with this mapping
series and assign it to the new column in data
.
# print(data)
id ds time_series_id y max_power_contractual
0 12858 2019-03-21 14:30:00 12858 49.25 0
1 12858 2019-03-21 14:15:00 12858 52.50 0
2 12858 2019-03-21 17:15:00 12858 49.25 0
3 12858 2019-03-21 13:45:00 12858 49.00 0
4 12858 2019-03-21 13:30:00 12858 51.75 0
5 22231 2019-11-11 12:00:00 22231 10.50 10
6 22231 2019-11-11 11:45:00 22231 10.00 10
7 22231 2019-11-12 09:45:00 22231 10.00 10
8 22231 2019-11-14 21:45:00 22231 9.50 10
Upvotes: 2