ThunderCloud
ThunderCloud

Reputation: 101

compare a two date columns of a data frame with another two data frames of second data frame in python

I have two dataframes df1 and df2 df1 contains month and two date columns

df1

Month      Month_Start    Month_End
Month1      2022-03-27    2022-04-30
Month2      2022-05-01    2022-05-28
Month3      2022-05-01    2022-06-25

another data frame df2

start_Month     end_Month      price
2022-03-27      2260-12-31      1
2022-03-27      2260-12-31      2
2022-03-27      2260-12-31      3


if Month_Start and Month_end of df1 is in between start_Month and end_Month of df2, assign price column value to Month column of df1

like following result

Month   price
Month1      1
Month2      1
Month3      1

I tried using for loops

for i in range(len(df2)):
    for j in range(len(df1)):
        if df2['start_Month'][i] <= df1['Month_Start'][j]<= df1['Month_End'][j] <= df2['end_Month'][i]:
            new.loc[len(new.index)] = [df1['month'][j], df2['price'][i]]


but taking lot of time for execution for 1000+ rows.

ANY IDEAS?

Upvotes: 3

Views: 109

Answers (2)

blackraven
blackraven

Reputation: 5637

Assuming these are your dataframes:

import pandas as pd

df1 = pd.DataFrame({ 'Month': ['Month1', 'Month2', 'Month3'], 
                     'Month_Start': ['2022-03-27', '2022-05-01', '2022-05-01'], 
                     'Month_End': ['2022-04-30', '2022-05-28', '2022-06-25'] })
df1['Month_Start'] = pd.to_datetime(df1['Month_Start'])
df1['Month_End'] = pd.to_datetime(df1['Month_End'])

df2 = pd.DataFrame({ 'start_Month': ['2022-03-01', '2022-05-01', '2022-06-01'], 
                     'end_Month': ['2022-04-30', '2022-05-30', '2022-06-30'],
                     'price': [1, 2, 3] })
df2['start_Month'] = pd.to_datetime(df2['start_Month'])
df2['end_Month'] = pd.to_datetime(df2['end_Month'])

print(df1)
    Month Month_Start  Month_End
0  Month1  2022-03-27 2022-04-30
1  Month2  2022-05-01 2022-05-28
2  Month3  2022-05-01 2022-06-25

print(df2)    #note validity periods do not overlap, so only 1 price is valid!
  start_Month  end_Month  price
0  2022-03-01 2022-04-30      1
1  2022-05-01 2022-05-30      2
2  2022-06-01 2022-06-30      3

I would define an external function to check the validity period, then return the corresponding price. Note that if more than 1 corresponding validity periods are found, the first one will be returned. If no corresponding period is found, a null value is returned.

def check_validity(row):
    try:
        return int(df2['price'][(df2['start_Month']<=row['Month_Start']) & (row['Month_End']<=df2['end_Month'])].values[0])
    except:
        return
    
df1['price'] = df1.apply(lambda x: check_validity(x), axis=1)
print(df1)

Output:

    Month Month_Start  Month_End  price
0  Month1  2022-03-27 2022-04-30    1.0
1  Month2  2022-05-01 2022-05-28    2.0
2  Month3  2022-05-01 2022-06-25    NaN

Upvotes: 0

Bushmaster
Bushmaster

Reputation: 4608

Is there a common column where you can combine these two dataframes? such as id. If there is, it would be much more accurate to apply the conditions after combining these two tables. You can try the code below based on current data and conditions (Dataframes that are not the same size may have a problem.).

import pandas as pd
import numpy as np
df1=pd.DataFrame(data={'Month':['Month1','Month2','Month3'],
                       'Month_Start':['2022-03-27','2022-05-01','2022-05-01'],
                       'Month_End':['2022-04-30','2022-05-28','2022-06-25']})

df2=pd.DataFrame(data={'start_Month':['2022-03-27','2022-03-27','2022-03-27'],
                       'end_Month':['2260-12-31','2260-12-31','2260-12-31'],
                 'price':[1,2,3]})

con=[(df1['Month_Start']>= df2['start_Month']) & (df1['Month_End']<= df2['end_Month'])]
cho=[df2['price']]
df1['price']=np.select(con,cho,default=np.nan)#

Upvotes: 0

Related Questions