Reputation: 5
I currently have 1 text file (tab del) with 15,000 records. I have another text file (tab del) with 5000 records.
In the file with 5000 records there are some row that match with the file containing 15,000 records. These are identifiable by a column header named URN (unique record number). for example i may need URN 62294 to be taken out of the main file, but i don't know i have to take that one out until i compare the two files and see that it is in both.
how difficult is this to do in python?
Upvotes: 0
Views: 649
Reputation: 2006
Try installing pandas with pip install pandas
Then run this:
import pandas as pd
filename1 = #main file
filename2 = #the other file
main = pd.read_csv(filename1,sep='\t') # sep='\t' is for tab delimited file
side = pd.read_csv(filename2,sep='\t')
main['URN'] = main['URN'].astype(int)
side['URN'] = side['URN'].astype(int)
merge = pd.merge(main,side,on='URN',how='inner') #how=inner means the URN value is in both 2 files
#merge = merge[merge['URN'] != 62294]
print (merge)
merge.to_excel('Output.xlsx,index=False)
Upvotes: 1
Reputation: 12
Is it difficult ? No, you could do it rather easily with
file1 = open("file1.txt","r")
results = []
for line in file1:
file2 = open("file2.txt","r")
for l in file2:
if (l.split("\t")[0] == line.split("\t")[0]):
results.append(l.split("\t")[0])
break
file2.close()
file1.close()
for i in results:
print(i)
Now, is it the best way ? Probably not for large files. (Took me 74 seconds with your files).
Upvotes: 0