Finding common elements between a list and a dictionary in python

Question

I have two files like this, A list of proteins -

TRIUR3_05947-P1
TRIUR3_06394-P1
Traes_1BL_EB95F4919.2

And a dictionary of tab-delimited contigs and proteins -

contig22 TRIUR3_05947-P1
contig15 TRIUR3_05947-P1
contig1 Traes_1BL_EB95F4919.2
contig67 Traes_1BL_EB95F4919.2
contig98 Traes_1BL_EB95F4919.2
contig45 MLOC_71599.4

My desired output is that it finds all common proteins and prints me results like this,

contig22 TRIUR3_05947-P1
contig15 TRIUR3_05947-P1
contig1 Traes_1BL_EB95F4919.2
contig67 Traes_1BL_EB95F4919.2
contig98 Traes_1BL_EB95F4919.2

This is my script below, but it gives me the result of the common key just ones, I guess overriding over, how can this be solved?

f1=open('mydict.txt','r')
f2=open('mylist.txt','r')
output = open('result.txt','w')
dictA= dict()
for line1 in f1:
    listA = line1.rstrip('
').split('	')
    dictA[listA[1]] = listA[0]

for line1 in f2:
    new_list=line1.rstrip('
').split()
    query=new_list[0]
    if query in dictA:
        listA[0] = dictA[query]
        output.write(query+'	'+str(listA[0])+'
')

eskaev · Accepted Answer

You do this the wrong way around. If you store the 'dictionary file' in a dictionary structure, using the protein names as keys, you will lose information.

A better way to do this, would be to read the list of proteins first, and store all the protein names in a set. Then, you read the dictionary file and print all lines whose protein name is in the set.

with open('mylist.txt') as mylist:
    proteins = set(line.strip() for line in mylist)

with open('mydict.txt') as mydict, open('result.txt', 'w') as output:
    for line in mydict:
        _, protein = line.strip().split()
        if protein in proteins:
            output.write(line)

Finding common elements between a list and a dictionary in python

Answers (2)

Related Questions