Reputation: 704
file1 :
col0 col1 col2 col3
1 20 10 -
1 25 30 +
2 50 40 -
2 60 70 +
file2:
col0 col1 col2 col3 col4 col5
1 25 5 - e1 e2
1 20 30 + e4 e24
2 45 35 - e9 e12
2 55 75 + e10 e21
I need to check for 3 conditions : 1) check if element in col0 is same (1 ==1)
2) if yes then check the col3 if its - or +, if - then the col1 value will be higher than col2 and the opposite if its +,
3) check if the range is within the col1 and col2 from file2 i.e in row two 25-30 fall within the range 20-30 if all conditions met, out put rows in file 1 and col4 and col5 in file 2 (which are some meta info)
col0 col1 col2 col3 col4 col5
1 20 10 - e1 e2
1 25 30 + e4 e24
2 50 40 - e9 e12
2 60 70 + e10 e21
simple python code would have a condition like this
for i in file1:
for j in file2:
if i[1]<j[1] and i[2]<j[2] and i[0]==j[0]:
print j
not sure how to account for the sign, How can i achieve this using pandas, or pandas would not be a right approach?
Upvotes: 1
Views: 1266
Reputation: 21264
Here's one approach:
Generate sample data provided by OP
# copy file1 data from OP, then run:
df1 = pd.read_clipboard().reset_index()
# copy file2 data from OP, then run:
df2 = pd.read_clipboard().reset_index()
Merge and filter data
# merge data on row index and col0 value
df3 = (df1.merge(df2.drop('col3',1), on=['index','col0'], suffixes=('_1','_2'))
.drop('index',1))
# sort range cols from file2
df3[['col1_2','col2_2']] = df3[['col1_2','col2_2']].apply(sorted, axis='columns')
# filter based on file1 range within file2 range, drop extra cols
df3 = (df3.loc[df3.col1_1.between(df3.col1_2, df3.col2_2) & df3.col2_1.between(df3.col1_2, df3.col2_2)]
.drop(['col1_2','col2_2'], 1))
print(df3)
Output
col0 col1_1 col2_1 col3 col4 col5
0 1 20 10 - e1 e2
1 1 25 30 + e4 e24
3 2 60 70 + e10 e21
Upvotes: 1