Reputation: 1
I am new to python. I have written a code and it gives desired solution. Would it be possible for you to help me rewrite the same in pandas/NumPy?
Here are the file content:
File1
contains the following information:
ID1 USA 18 200
ID1 IND 1 100
ID1 BEL 186 150
ID2 UK 185 200
ID3 UK 200 130
file2
contains:
mgk:ID1:brs 1-20 5000
vho:ID1:gld 30-40 4000
sun:ID3:slv 198-400 5500
My code:
`with open(r"x.txt","r")as X,open(r'y.txt',"r")as Y: datax=[] datay=[]
for eachx in X:
datax.append(eachx.strip())
for eachy in Y:
datay.append(eachy.strip())
for eachofx in datax:
dataxplitted=eachofx.split("\t")
xid=dataxplitted[0]
locx=int(dataxplitted[2])
#print (xid,locx)
for eachofy in datay:
dataysplitted=eachofy.split("\t")
Yends=dataysplitted[1].split("-")
ySTART=int(Yends[0])
ySTOP=int(Yends[1])
yIDdetails=(dataysplitted[0].split(":"))
yid=yIDdetails[1]
# print(yIDdetails,yid)
if(xid==yid):
if (int(ySTART)<= locx <=int(ySTOP)):
print (xid,mutID,locx,"exists",ySTART,ySTOP,locx-ySTART,ySTOP-locx)
` Output:
ID1 18 exits 1 20 17 2
ID2 1 exits 1 20 0 19
ID3 200 exits 198 400 2 200
Explanation: The files are to be compared based on ID1,ID2 etc. In file2(y.txt) it is a part of a string separated by ":".Once I find a matching, I need to check the values in the third column of File1 lies between the values in the second column of File2 (values are separated by "-"). If yes, I need to print "exists". Also I like to have a difference of those values in file1 to the two values (separated by "-") and print it along with. Thank you all.
Upvotes: 0
Views: 122
Reputation: 108
If you can use pandas:
import pandas as pd
df1 = pd.read_csv('x.txt', sep=' ', names=['id', 'country', 'num1', 'num2'])
df2 = pd.read_csv('y.txt', sep=' ', names=['id', 'num1', 'num3'])
final_df = df1.merge(df2, on = ['id', 'num1'], how='outer')
final_df['status'] = final_df.apply(lambda x: 'exist' if pd.notnull(x['num3']) else 'no', axis=1)
Upvotes: 1