Reputation: 87
I have Dataframe 1:
Hotel DateFrom DateTo Room
BBB 2019-10-29 2020-03-27 DHS
BBB 2020-03-28 2020-10-30 DHS
BBB 2020-10-31 2021-03-29 DHS
BBB 2021-03-30 2099-01-01 DHS
And Dataframe 2:
Hotel DateFrom DateTo Room Food
BBB 2020-03-01 2020-04-24 DHS A
BBB 2020-04-25 2020-05-03 DHS B
BBB 2020-05-04 2020-05-31 DHS C
BBB 2020-06-01 2020-06-22 DHS D
BBB 2020-06-23 2020-08-26 DHS E
BBB 2020-08-27 2020-11-30 DHS F
I need to check if each row in df1 and if df1_DateFrom is between df2_DateFrom and df2_DateTo. Then i need to get that food code from df2 to new column in df1 or as new df3 shown below.
The result would look like this:
df3:
Hotel DateFrom DateTo Room Food
BBB 2019-10-29 2020-03-27 DHS
BBB 2020-03-28 2020-10-30 DHS A
BBB 2020-10-31 2021-03-29 DHS F
BBB 2021-03-30 2099-01-01 DHS
I would really appreciate any help with this. I am kinda new on Pandas and still learning and i must say that it is bit complicated for me.
Upvotes: 2
Views: 1362
Reputation: 7212
Much less elegant then Quang Hoang's answer, but the solution using np.piecewise
would look like this. See also https://stackoverflow.com/a/30630905/4873972
import pandas as pd
import numpy as np
from io import StringIO
# Creating the dataframes.
df1 = pd.read_table(StringIO("""
Hotel DateFrom DateTo Room
BBB 2019-10-29 2020-03-27 DHS
BBB 2020-03-28 2020-10-30 DHS
BBB 2020-10-31 2021-03-29 DHS
BBB 2021-03-30 2099-01-01 DHS
"""), sep=r"\s+").convert_dtypes()
df1["DateFrom"] = pd.to_datetime(df1["DateFrom"])
df1["DateTo"] = pd.to_datetime(df1["DateTo"])
df2 = pd.read_table(StringIO("""
Hotel DateFrom DateTo Room Food
BBB 2020-03-01 2020-04-24 DHS A
BBB 2020-04-25 2020-05-03 DHS B
BBB 2020-05-04 2020-05-31 DHS C
BBB 2020-06-01 2020-06-22 DHS D
BBB 2020-06-23 2020-08-26 DHS E
BBB 2020-08-27 2020-11-30 DHS F
"""), sep=r"\s+").convert_dtypes()
df2["DateFrom"] = pd.to_datetime(df2["DateFrom"])
df2["DateTo"] = pd.to_datetime(df2["DateTo"])
# Avoid zero index for merging later on.
df2["id"] = np.arange(1, len(df2) +1 )
# Find matching indexes.
df1["df2_id"] = np.piecewise(
np.zeros(len(df1)),
[(df1["DateFrom"].values >= start_date) & (df1["DateFrom"].values <= end_date) for start_date, end_date in zip(df2["DateFrom"].values, df2["DateTo"].values)],
df2.index.values
)
# Merge on matching indexes.
df1.merge(df2["Food"], left_on="df2_id", right_index=True, how="left")
Output:
Hotel DateFrom DateTo Room Food
0 BBB 2019-10-29 2020-03-27 DHS NaN
1 BBB 2020-03-28 2020-10-30 DHS A
2 BBB 2020-10-31 2021-03-29 DHS F
3 BBB 2021-03-30 2099-01-01 DHS NaN
Upvotes: 0
Reputation: 150735
You can do a cross merge and query:
# recommend dealing with datetime type:
df1['DateFrom'],df1['DateTo'] = pd.to_datetime(df1['DateFrom']),pd.to_datetime(df1['DateTo'])
df2['DateFrom'],df2['DateTo'] = pd.to_datetime(df2['DateFrom']),pd.to_datetime(df2['DateTo'])
new_df = (df1.reset_index().merge(df2, on=['Hotel','Room'],
how='left', suffixes=['','_'])
.query('DateFrom_ <= DateFrom <= DateTo_')
)
df1['Food'] = new_df.set_index('index')['Food']
Output:
Hotel DateFrom DateTo Room Food
0 BBB 2019-10-29 2020-03-27 DHS NaN
1 BBB 2020-03-28 2020-10-30 DHS A
2 BBB 2020-10-31 2021-03-29 DHS F
3 BBB 2021-03-30 2099-01-01 DHS NaN
Upvotes: 3