Fluxy
Fluxy

Reputation: 2978

In pandas, how to replace subdataframe by another subdataframe

I have the following 2 pandas dataframes (the column names are same, but df1 contains the data for 2019 and df2 contains the data for 2020):

df1

DateTime            Col1    Col2      Col3
2019-10-14 00:00:00 5.35    201.75    16.50
2019-10-14 00:10:00 4.90    187.65    16.49
2019-10-14 00:20:00 4.80    186.89    16.48
2019-10-14 00:30:00 5.12    177.48    16.46
2019-10-14 00:40:00 5.83    332.94    6.45

df2

DateTime            Col1    Col2       Col3
2020-10-14 00:00:00 5.35    231.75    14.50
2020-10-14 00:10:00 6.90    217.15    14.49
2020-10-14 00:20:00 6.80    181.69    14.46
2020-10-14 00:30:00 7.12    175.48    14.48
2020-10-14 00:40:00 5.83    212.95    15.45

I need to substitute the rows of df1 between 2020-10-14 00:10:00 and 2020-10-14 00:30:00 (including) by the rows from df2.

This is the expected result (the 2nd, 3rd and 4th rows are taken from df2, while the DateTime values remain from df1):

DateTime            Col1    Col2      Col3
2019-10-14 00:00:00 5.35    201.75    16.50
2019-10-14 00:10:00 6.90    217.15    14.49
2019-10-14 00:20:00 6.80    181.69    14.46
2019-10-14 00:30:00 7.12    175.48    14.48
2019-10-14 00:40:00 5.83    332.94    6.45

How can I do it? Is it possible to apply panda's "where" here?

Upvotes: 2

Views: 235

Answers (3)

jezrael
jezrael

Reputation: 863291

Solution for match by DatetimeIndex with subtract one year for dates in year 2020, so matching by year 2019 rows in another DataFrame:

df22 = df2.set_index('DateTime').loc['2020-10-14 00:10:00':'2020-10-14 00:30:00']
df1 = df1.set_index('DateTime')

df = df22.rename(lambda x: x - pd.DateOffset(years=1)).combine_first(df1)
print (df)
                     Col1    Col2   Col3
DateTime                                
2019-10-14 00:00:00  5.35  201.75  16.50
2019-10-14 00:10:00  6.90  217.15  14.49
2019-10-14 00:20:00  6.80  181.69  14.46
2019-10-14 00:30:00  7.12  175.48  14.48
2019-10-14 00:40:00  5.83  332.94   6.45

Alternative with update:

df22 = df2.set_index('DateTime').loc['2020-10-14 00:10:00':'2020-10-14 00:30:00']
df1 = df1.set_index('DateTime')

df22 = df22.rename(lambda x: x - pd.DateOffset(years=1))

df1.update(df22)
print (df1)
                     Col1    Col2   Col3
DateTime                                
2019-10-14 00:00:00  5.35  201.75  16.50
2019-10-14 00:10:00  6.90  217.15  14.49
2019-10-14 00:20:00  6.80  181.69  14.46
2019-10-14 00:30:00  7.12  175.48  14.48
2019-10-14 00:40:00  5.83  332.94   6.45

Upvotes: 3

mozway
mozway

Reputation: 261974

You can use where:

df1['DateTime'] = pd.to_datetime(df1['DateTime'])
df2.where(df1['DateTime'].between('2019-10-14 00:10:00',
                                  '2019-10-14 00:30:00'),
          df1)

or mask:

df1['DateTime'] = pd.to_datetime(df1['DateTime'])
df1.mask(df1['DateTime'].between('2019-10-14 00:10:00',
                                 '2019-10-14 00:30:00'),
         df2)

output:

              DateTime  Col1    Col2   Col3
0  2019-10-14 00:00:00  5.35  201.75  16.50
1  2020-10-14 00:10:00  6.90  217.15  14.49
2  2020-10-14 00:20:00  6.80  181.69  14.46
3  2020-10-14 00:30:00  7.12  175.48  14.48
4  2019-10-14 00:40:00  5.83  332.94   6.45

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71610

Try with loc:

>>> df1.loc[df1['DateTime'].ge('2019-10-14 00:10:00') & df1['DateTime'].le('2019-10-14 00:30:00'), ['Col1', 'Col2', 'Col3']] = df2
>>> df1
              DateTime  Col1    Col2   Col3
0  2019-10-14 00:00:00  5.35  201.75  16.50
1  2019-10-14 00:10:00  6.90  217.15  14.49
2  2019-10-14 00:20:00  6.80  181.69  14.46
3  2019-10-14 00:30:00  7.12  175.48  14.48
4  2019-10-14 00:40:00  5.83  332.94   6.45
>>> 

Upvotes: 1

Related Questions