Gita
Gita

Reputation: 1

Compare two files based on column and output should contain both matched and non matched entries

I am new to python. I have written a code and it gives desired solution. Would it be possible for you to help me rewrite the same in pandas/NumPy?

Here are the file content:

File1 contains the following information:

ID1 USA 18  200

ID1 IND  1  100

ID1 BEL 186 150

ID2 UK  185 200

ID3 UK  200 130

file2 contains:

mgk:ID1:brs 1-20    5000

vho:ID1:gld 30-40    4000

sun:ID3:slv 198-400 5500

My code:

`with open(r"x.txt","r")as X,open(r'y.txt',"r")as Y: datax=[] datay=[]

for eachx in X:
    datax.append(eachx.strip())
    
for eachy in Y:
    datay.append(eachy.strip())
    
for eachofx in datax:
    dataxplitted=eachofx.split("\t")
    xid=dataxplitted[0]
    locx=int(dataxplitted[2])
    #print (xid,locx)
    for eachofy in datay:
        dataysplitted=eachofy.split("\t")
        Yends=dataysplitted[1].split("-")
        ySTART=int(Yends[0])
        ySTOP=int(Yends[1])
        yIDdetails=(dataysplitted[0].split(":"))
        yid=yIDdetails[1]
       # print(yIDdetails,yid)
        
        if(xid==yid):
            if (int(ySTART)<= locx <=int(ySTOP)):
                
                    print (xid,mutID,locx,"exists",ySTART,ySTOP,locx-ySTART,ySTOP-locx)

` Output:

ID1 18 exits 1 20 17 2

ID2 1 exits 1 20 0 19

ID3 200 exits 198 400 2 200

Explanation: The files are to be compared based on ID1,ID2 etc. In file2(y.txt) it is a part of a string separated by ":".Once I find a matching, I need to check the values in the third column of File1 lies between the values in the second column of File2 (values are separated by "-"). If yes, I need to print "exists". Also I like to have a difference of those values in file1 to the two values (separated by "-") and print it along with. Thank you all.

Upvotes: 0

Views: 122

Answers (1)

shreyansh
shreyansh

Reputation: 108

If you can use pandas:

import pandas as pd
df1 = pd.read_csv('x.txt', sep=' ', names=['id', 'country', 'num1', 'num2'])
df2 = pd.read_csv('y.txt', sep=' ', names=['id', 'num1', 'num3'])

final_df = df1.merge(df2, on = ['id', 'num1'], how='outer')
final_df['status'] = final_df.apply(lambda x: 'exist' if pd.notnull(x['num3']) else 'no', axis=1)

Upvotes: 1

Related Questions