Reputation: 2978
I have the following 2 pandas dataframes (the column names are same, but df1 contains the data for 2019 and df2 contains the data for 2020):
df1
DateTime Col1 Col2 Col3
2019-10-14 00:00:00 5.35 201.75 16.50
2019-10-14 00:10:00 4.90 187.65 16.49
2019-10-14 00:20:00 4.80 186.89 16.48
2019-10-14 00:30:00 5.12 177.48 16.46
2019-10-14 00:40:00 5.83 332.94 6.45
df2
DateTime Col1 Col2 Col3
2020-10-14 00:00:00 5.35 231.75 14.50
2020-10-14 00:10:00 6.90 217.15 14.49
2020-10-14 00:20:00 6.80 181.69 14.46
2020-10-14 00:30:00 7.12 175.48 14.48
2020-10-14 00:40:00 5.83 212.95 15.45
I need to substitute the rows of df1 between 2020-10-14 00:10:00 and 2020-10-14 00:30:00 (including) by the rows from df2.
This is the expected result (the 2nd, 3rd and 4th rows are taken from df2, while the DateTime values remain from df1):
DateTime Col1 Col2 Col3
2019-10-14 00:00:00 5.35 201.75 16.50
2019-10-14 00:10:00 6.90 217.15 14.49
2019-10-14 00:20:00 6.80 181.69 14.46
2019-10-14 00:30:00 7.12 175.48 14.48
2019-10-14 00:40:00 5.83 332.94 6.45
How can I do it? Is it possible to apply panda's "where" here?
Upvotes: 2
Views: 235
Reputation: 863291
Solution for match by DatetimeIndex
with subtract one year for dates in year 2020
, so matching by year 2019
rows in another DataFrame:
df22 = df2.set_index('DateTime').loc['2020-10-14 00:10:00':'2020-10-14 00:30:00']
df1 = df1.set_index('DateTime')
df = df22.rename(lambda x: x - pd.DateOffset(years=1)).combine_first(df1)
print (df)
Col1 Col2 Col3
DateTime
2019-10-14 00:00:00 5.35 201.75 16.50
2019-10-14 00:10:00 6.90 217.15 14.49
2019-10-14 00:20:00 6.80 181.69 14.46
2019-10-14 00:30:00 7.12 175.48 14.48
2019-10-14 00:40:00 5.83 332.94 6.45
Alternative with update
:
df22 = df2.set_index('DateTime').loc['2020-10-14 00:10:00':'2020-10-14 00:30:00']
df1 = df1.set_index('DateTime')
df22 = df22.rename(lambda x: x - pd.DateOffset(years=1))
df1.update(df22)
print (df1)
Col1 Col2 Col3
DateTime
2019-10-14 00:00:00 5.35 201.75 16.50
2019-10-14 00:10:00 6.90 217.15 14.49
2019-10-14 00:20:00 6.80 181.69 14.46
2019-10-14 00:30:00 7.12 175.48 14.48
2019-10-14 00:40:00 5.83 332.94 6.45
Upvotes: 3
Reputation: 261974
You can use where
:
df1['DateTime'] = pd.to_datetime(df1['DateTime'])
df2.where(df1['DateTime'].between('2019-10-14 00:10:00',
'2019-10-14 00:30:00'),
df1)
or mask
:
df1['DateTime'] = pd.to_datetime(df1['DateTime'])
df1.mask(df1['DateTime'].between('2019-10-14 00:10:00',
'2019-10-14 00:30:00'),
df2)
output:
DateTime Col1 Col2 Col3
0 2019-10-14 00:00:00 5.35 201.75 16.50
1 2020-10-14 00:10:00 6.90 217.15 14.49
2 2020-10-14 00:20:00 6.80 181.69 14.46
3 2020-10-14 00:30:00 7.12 175.48 14.48
4 2019-10-14 00:40:00 5.83 332.94 6.45
Upvotes: 2
Reputation: 71610
Try with loc
:
>>> df1.loc[df1['DateTime'].ge('2019-10-14 00:10:00') & df1['DateTime'].le('2019-10-14 00:30:00'), ['Col1', 'Col2', 'Col3']] = df2
>>> df1
DateTime Col1 Col2 Col3
0 2019-10-14 00:00:00 5.35 201.75 16.50
1 2019-10-14 00:10:00 6.90 217.15 14.49
2 2019-10-14 00:20:00 6.80 181.69 14.46
3 2019-10-14 00:30:00 7.12 175.48 14.48
4 2019-10-14 00:40:00 5.83 332.94 6.45
>>>
Upvotes: 1