Reputation: 159
I have two dataframes:
df1
with columns: id | date
and
df2
with columns: id| begin_date | end_date | var
I need to add var
column to df1
for conditions:
df1['id'] == df2['id']
&df2['begin_date'] <= df1['date'] < df2['end_date']
df2
meeting above conditions so take the last row.Expectet result: df1
with columns: id | date | var
.
Thanks for advices.
Upvotes: 0
Views: 65
Reputation: 472
You can this do in a single step but your data types need to be consistent before you start comparing.
df1['var'] = df1.apply(lambda x: df2.loc[max(df2[(x['id'] == df2['id']) & (df2['begin_date'] == x['date']) & (x['date'] < df2['end_date'])].index), 'var'], axis = 1)
Upvotes: 1
Reputation: 4130
as far as i understand your requirement,i think you can do it by two steps.
1.first - merge the two dataframes by id
merged_data_frame = pd.merge(df1,df2,how="inner",on=["id"],indicator=True)
2.second step
filter data by your date condition
merged_data_frame["date"] = pd.to_datetime(merged_data_frame["date"])
merged_data_frame["begin_date"] = pd.to_datetime(merged_data_frame["begin_date"])
merged_data_frame["end_date"] = pd.to_datetime(merged_data_frame["end_date"])
merged_data_frame = merged_data_frame[merged_data_frame.begin_date <= merged_data_frame.date]
merged_data_frame = merged_data_frame[merged_data_frame.date > merged_data_frame.end_date]
out=merged_data_frame.drop_duplicates(subset=["id"], keep='last')
Upvotes: 1