Reputation: 241
I would like to merge (using how = 'left') whereas dataframe_A is on the left and data_frame_B is on the right. The Column/index level names to join on are "name","weight" and money". The height and weight difference is allow up to 2 cm.
I am not using for loop as my dataset is too big, it will take 2 days to complete
E.g.
INPUT
dataframe_A name:John, height: 170, weight :70
dataframe_B name:John, height 172, weight :69
OUTPUT
output_dataframe : name:John,height: 170, weight :70, money:100, grade :1
I have two dataframe :
dataframe_A = pd.DataFrame({'name': ['John', 'May', 'Jane', 'Sally'],
'height': [170, 180, 160, 155],
'weight': [70, 88, 60, 65],
'money': [100, 1120, 2000, 3000]})
dataframe_B = pd.DataFrame({'name': ['John', 'May', 'Jane', 'Sally'],
'height': [172, 180, 160, 155],
'weight': [69, 88, 60, 65],
'grade': [1, 2, 3, 4]})
In selecting statment should be,
SELECT * FROM dataframe_A LEFT JOIN dataframe_B
ON dataframe_A.name= dataframe_B.name and
dataframe_A.height => dataframe_B.height+2 or
dataframe_A.height <= dataframe_B.height-2 and
dataframe_A.weight=> dataframe_B.weight+2 or
dataframe_A.weight<= dataframe_B.weight-2
;
But I am unsure how to put it in python as i am still learning
output_dataframe =pd.merge(dataframe_A,dataframe_B,how='left',on=['name','height','weight'] + ***the range condition***
Upvotes: 1
Views: 98
Reputation: 863311
Use merge
first and then filter by boolean indexing
with Series.between
:
df = pd.merge(dataframe_A, dataframe_B, on='name', how='left', suffixes=('','_'))
m1 = df['height'].between(df['height_'] - 2, df['height_'] + 2)
m2 = df['weight'].between(df['weight_'] - 2, df['weight_'] + 2)
df = df.loc[m1 & m2, dataframe_A.columns.tolist() + ['grade']]
print (df)
name height weight money grade
0 John 170 70 100 1
1 May 180 88 1120 2
2 Jane 160 60 2000 3
3 Sally 155 65 3000 4
Upvotes: 1