mbf94
mbf94

Reputation: 5

Compare 2 text files and export the matches Python

I currently have 1 text file (tab del) with 15,000 records. I have another text file (tab del) with 5000 records.

In the file with 5000 records there are some row that match with the file containing 15,000 records. These are identifiable by a column header named URN (unique record number). for example i may need URN 62294 to be taken out of the main file, but i don't know i have to take that one out until i compare the two files and see that it is in both.

how difficult is this to do in python?

Upvotes: 0

Views: 649

Answers (3)

Alex Fung
Alex Fung

Reputation: 2006

Try installing pandas with pip install pandas

Then run this:

import pandas as pd

filename1 = #main file
filename2 = #the other file
main = pd.read_csv(filename1,sep='\t') # sep='\t' is for tab delimited file
side = pd.read_csv(filename2,sep='\t')
main['URN'] = main['URN'].astype(int)
side['URN'] = side['URN'].astype(int)
merge = pd.merge(main,side,on='URN',how='inner') #how=inner means the URN value is in both 2 files
#merge = merge[merge['URN'] != 62294]
print (merge)
merge.to_excel('Output.xlsx,index=False)

Upvotes: 1

Pol
Pol

Reputation: 12

Is it difficult ? No, you could do it rather easily with

file1 = open("file1.txt","r")
results = []

for line in file1:
    file2 = open("file2.txt","r")
    for l in file2:
        if (l.split("\t")[0] == line.split("\t")[0]):
            results.append(l.split("\t")[0])
            break
    file2.close()
file1.close()

for i in results:
    print(i)

Now, is it the best way ? Probably not for large files. (Took me 74 seconds with your files).

Upvotes: 0

zuku
zuku

Reputation: 667

You can look at Pandas library. It will allow you to load both tables as dataframes and join them on needed column in sql-like style. It should be rather easy with documentation.

Upvotes: 0

Related Questions