Rob John
Rob John

Reputation: 287

search contents of one file with contents of a second file using python

I have the following code which compares the items on the first column of input file1 with the contents of input file 2:

import os

newfile2=[]
outfile=open("outFile.txt","w")
infile1=open("infile1.txt", "r")
infile2=open("infile2.txt","r")
for file1 in infile1:
    #print file1
    file1=str(file1).strip().split("\t")
    print file1[0]
    for file2 in infile2:
        if file2 == file1[0]:
            outfile.write(file2.replace(file2,file1[1]))
        else:
            outfile.write(file2)

input file 1:

Modex_xxR_SL1344_3920   Modex_sseE_SL1344_3920
Modex_seA_hemN  Modex_polA_SGR222_3950
Modex_GF2333_3962_SL1344_3966   Modex_ertd_wedS

input file 2:

Sardes_xxR_SL1344_4567  
Modex_seA_hemN
MOdex_uui_gytI

Since the input file 1 item (column 1, row 2) matches an item in input file 2 (row 2), then the column 2 item in input file 1 replaces the input file 2 item in the output file as follows (required output):

Sardes_xxR_SL1344_4567  
Modex_polA_SGR222_3950
MOdex_uui_gytI

So far my code is only outputting the items in input file 1. Can someone help modify this code. Thanks

Upvotes: 1

Views: 139

Answers (1)

Adam Smith
Adam Smith

Reputation: 54253

Looks like you have a tsv file, so let's go ahead and treat it as such. We'll build a tsv reader csv.reader(fileobj, delimiter="\t") that will iterate through infile1 and build a translation dict from it. The dictionary will have keys of the first column and values of the second column per row.

Then using dict.get we can translate the line from infile2 if it exists in our translation dict, or just write the line itself if there's no translation available.

import csv

with open("infile1.txt", 'r') as infile1,\
     open('infile2.txt', 'r') as infile2,\
     open('outfile.txt', 'w') as outfile:
    trans_dict = dict(csv.reader(infile1, delimiter="\t"))

    for line in infile2:
        outfile.write(trans_dict.get(line.strip(),line.strip()) + "\n")

Result:

# contents of outfile.txt
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI

EDIT as per your comment:

import csv

    with open("infile1.txt", 'r') as infile1:
        # build our translation dict
        trans_dict = dict(csv.reader(infile1, delimiter="\t"))

    with open("infile2.txt", 'r') as infile2,\
         open("outfile.txt", 'w') as outfile:
        # open the file to translate and our output file
        reader = csv.reader(infile2, delimiter="\t")
        # treat our file to translate like a tsv file instead of flat text
        for line in reader:
            outfile.write("\t".join([trans_dict.get(col, col) for col in line] + "\n"))
            # map each column from trans_dict, writing the whole row
            # back re-tab-delimited with a trailing newline

Upvotes: 2

Related Questions