Reputation: 2871
I have two dataframe as shown below df1 and df2 as shown below.
df1:
Date t_factor category
2020-02-01 5 A
2020-02-02 2 B
2020-02-03 1 C
2020-02-04 2 A
2020-02-05 3 B
2020-02-06 3 C
2020-02-07 3 A
2020-02-08 9 B
2020-02-09 1 C
2020-02-10 8 A
2020-02-11 3 B
2020-02-12 3 C
df2:
Date beta
2020-02-01 100
2020-02-02 230
2020-02-03 150
2020-02-04 100
2020-02-05 200
2020-02-06 180
2020-02-07 190
2020-02-08 290
from the above I would like to replace t_factor column of df1 with beta column of df2 based on the input date range.
The function could be like this.
def replace_column(df1, df2, start_date = `2020-02-03`, end_date = `2020-02-06`):
df1 = df1.copy()
df2 = df2.copy()
df1 = df1.sort_values(['Date'], ascending=True)
df2 = df2.sort_values(['Date'], ascending=True)
df1['t_factor'] = df1['beta'] # for that date range
return df1
Expected output: for start_date = 2020-02-03
and end_date = 2020-02-06
df1:
Date t_factor category
2020-02-01 5 A
2020-02-02 2 B
2020-02-03 150 C
2020-02-04 100 A
2020-02-05 200 B
2020-02-06 180 C
2020-02-07 3 A
2020-02-08 9 B
2020-02-09 1 C
2020-02-10 8 A
2020-02-11 3 B
2020-02-12 3 C
Note: df2 has less data, final date of df2 is 2020-02-08
.
if start_date = `2020-02-07` and end_date = `2020-02-11`.
Then Expected output:
Date t_factor category
2020-02-01 5 A
2020-02-02 2 B
2020-02-03 1 C
2020-02-04 2 A
2020-02-05 3 B
2020-02-06 3 C
2020-02-07 190 A
2020-02-08 290 B
2020-02-09 1 C
2020-02-10 8 A
2020-02-11 3 B
2020-02-12 3 C
print ('df2 dont have data after 2020-02-08')
Upvotes: 1
Views: 72
Reputation: 71689
Use pd.to_datetime
to convert the Date
like columns to pandas datetime
series.
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
Then use Series.between
and specify the start date(left
) and end date(right
) to create a boolean mask m
, then use boolean indexing
with this mask and use Series.map
to map the beta
values from df2
to t_function
values in df1
.
m = df1['Date'].between('2020-02-03', '2020-02-06', inclusive=True)
df1.loc[m, 't_factor'] = df1['Date'].map(df2.set_index('Date')['beta']).fillna(df1['t_factor'])
Another idea using DataFrame.merge
:
df1 = df1.merge(df2, on='Date', how='left')
m = df1['Date'].between('2020-02-03', '2020-02-06', inclusive=True)
df1.loc[m, 't_factor'] = df1.pop('beta').fillna(df1['t_factor'])
Result:
# start=2020-02-03, end=2020-02-06
Date t_factor category
0 2020-02-01 5.0 A
1 2020-02-02 2.0 B
2 2020-02-03 150.0 C
3 2020-02-04 100.0 A
4 2020-02-05 200.0 B
5 2020-02-06 180.0 C
6 2020-02-07 3.0 A
7 2020-02-08 9.0 B
8 2020-02-09 1.0 C
9 2020-02-10 8.0 A
10 2020-02-11 3.0 B
11 2020-02-12 3.0 C
# start=2020-02-07, end=2020-02-11.
Date t_factor category
0 2020-02-01 5.0 A
1 2020-02-02 2.0 B
2 2020-02-03 1.0 C
3 2020-02-04 2.0 A
4 2020-02-05 3.0 B
5 2020-02-06 3.0 C
6 2020-02-07 190.0 A
7 2020-02-08 290.0 B
8 2020-02-09 1.0 C
9 2020-02-10 8.0 A
10 2020-02-11 3.0 B
11 2020-02-12 3.0 C
Function that wraps the merging
method (Method 2)
:
def fx(df1, df2, start, end):
if df2['Date'].max() < pd.Timestamp(end):
print(f"we dont have data beyound {df2['Date'].max()}")
df1 = df1.merge(df2, on='Date', how='left')
m = df1['Date'].between(start, end, inclusive=True)
df1.loc[m, 't_factor'] = df1.pop('beta').fillna(df1['t_factor'])
return df1
Upvotes: 1
Reputation: 56
My solution uses df.join
and df.loc
methods.
First initialize the data.
df1 = pd.DataFrame({'Date' : ['2020-02-01', '2020-02-05', '2020-02-06', '2020-02-12'],'t_factor' : [5, 3, 3, 3]})
df2 = pd.DataFrame({'Date' : ['2020-02-05', '2020-02-06'],'beta' : [200, 180]})
Then set Date
as index.
df1d = df1.set_index('Date')
df2d = df2.set_index('Date')
Now the key steps.
dfres=df1d.join(df2d)
dfres.loc[dfres['beta'].notnull(), 't_factor'] = dfres.loc[dfres['beta'].notnull()].beta
One more step to match the expected output.
output=dfres.drop(columns='beta')
Upvotes: 1