Jolin
Jolin

Reputation: 241

Python Merge Key by having condition ( Ranges ) in 2 DataFrames using python

I would like to merge (using how = 'left') whereas dataframe_A is on the left and data_frame_B is on the right. The Column/index level names to join on are "name","weight" and money". The height and weight difference is allow up to 2 cm.
I am not using for loop as my dataset is too big, it will take 2 days to complete


E.g.

INPUT

dataframe_A name:John, height: 170, weight :70
dataframe_B name:John, height 172, weight :69

OUTPUT

output_dataframe : name:John,height: 170, weight :70, money:100, grade :1

I have two dataframe :

dataframe_A = pd.DataFrame({'name': ['John', 'May', 'Jane', 'Sally'],
                            'height': [170, 180, 160, 155],
                            'weight': [70, 88, 60, 65],
                            'money': [100, 1120, 2000, 3000]})

dataframe_B = pd.DataFrame({'name': ['John', 'May', 'Jane', 'Sally'],
                            'height': [172, 180, 160, 155],
                            'weight': [69, 88, 60, 65],
                            'grade': [1, 2, 3, 4]})

In selecting statment should be,

    SELECT * FROM dataframe_A LEFT JOIN dataframe_B 
ON dataframe_A.name= dataframe_B.name and 
dataframe_A.height => dataframe_B.height+2 or
dataframe_A.height <= dataframe_B.height-2 and
dataframe_A.weight=> dataframe_B.weight+2 or
dataframe_A.weight<= dataframe_B.weight-2 
;

But I am unsure how to put it in python as i am still learning

output_dataframe =pd.merge(dataframe_A,dataframe_B,how='left',on=['name','height','weight'] + ***the range condition***

Upvotes: 1

Views: 98

Answers (1)

jezrael
jezrael

Reputation: 863311

Use merge first and then filter by boolean indexing with Series.between:

df = pd.merge(dataframe_A, dataframe_B, on='name', how='left', suffixes=('','_'))
m1 = df['height'].between(df['height_'] - 2, df['height_'] + 2)
m2 = df['weight'].between(df['weight_'] - 2, df['weight_'] + 2)

df = df.loc[m1 & m2, dataframe_A.columns.tolist() + ['grade']]
print (df)
    name  height  weight  money  grade
0   John     170      70    100      1
1    May     180      88   1120      2
2   Jane     160      60   2000      3
3  Sally     155      65   3000      4

Upvotes: 1

Related Questions