Reputation: 5823
Pandas currently doesn't support a conditional join in the SQL sense; however, one can be emulated (with lesser performance) by performing a larger join on common fields and then applying a filter on the post-join processing.
However, I'm looking for a way to preserve left (or right) elements through this conditional join. I lose them through the post-processing.
import pandas
# Create a dataframe
df = pandas.DataFrame([{'name': 'A', 'start': '20171201', 'end': '20180205'}, {'name': 'A', 'start': '20170901', 'end': '20170905'}, {'name': 'B', 'start': '20190101', 'end': '20190205'}])
df['start'] = pandas.to_datetime(df['start'])
df['end'] = pandas.to_datetime(df['end'])
print df
end name start
0 2018-02-05 A 2017-12-01
1 2017-09-05 A 2017-09-01
2 2019-02-05 B 2019-01-01
# Create another dataframe, don't want to lose any data here.
v_df = pandas.DataFrame([{'name': 'A', 'val': 10, 'date': '20180101'}, {'name': 'B', 'val': 20, 'date': '20170101'}])
v_df['date'] = pandas.to_datetime(v_df['date'])
print v_df
date name val
0 2018-01-01 A 10
1 2017-01-01 B 20
# Conditional Left Join both dataframes, want to avoid losing the name B val.
v_df = v_df.merge(df, how='left', on=['name'])
v_df = v_df[v_df['date'].between(v_df['start'], v_df['end'])]
print v_df
date name val end start
0 2018-01-01 A 10 2018-02-05 2017-12-01
Desired output in this case would be the following, which includes a left
record from B.
date name val end start
0 2018-01-01 A 10 2018-02-05 2017-12-01
2 2017-01-01 B 20 NaT NaT
A solution would also need to be able to handle when multiple records match in the general merge, but when none of that group match the condition, a null record is returned (rather than no record at all).
Upvotes: 1
Views: 497
Reputation: 2369
filter_df = df.merge(v_df)
filter_df = filter_df[filter_df['date'].between(filter_df['start'], filter_df['end'])]
v_df.merge(filter_df, how='left')
should return what you want.
The filter_df
identifies what rows a join properly works on, which you do in the question, and the second merge attaches these start
and end
values to only the relevant rows.
Edit:
@MaxU's answer is functionally equivalent and syntactically more elegant.
Upvotes: 2
Reputation: 210852
I'm not sure this is the best approach, but it seems to do the job:
In [191]: v_df.merge(v_df.merge(df).query("start <= date <= end"), how='left')
Out[191]:
date name val end start
0 2018-01-01 A 10 2018-02-05 2017-12-01
1 2017-01-01 B 20 NaT NaT
Upvotes: 2