Reputation: 1151
My input file,
ID1 ID2 value
ID3 ID6 value
ID2 ID1 value
ID4 ID5 value
ID6 ID5 value
ID5 ID4 value
ID7 ID2 value
Desired output, file1.txt
ID1 ID2 value ID2 ID1 value
ID4 ID5 value ID5 ID4 value
file2.txt
ID3 ID6 value
ID6 ID5 value
ID7 ID2 value
I am trying to get bi-dicrectional best matches. if have an ID1 that has a hit ID2, ID2 also has as a hit ID1, print in file1, otherwise in file2. What I tried to do is to create a copy of the input file and create a dictionary.But this gives outputs without the values (10 columns). How to modify it?
fileA = open("input.txt",'r')
fileB = open("input_copy.txt",'r')
output = open("out.txt",'w')
dictA = dict()
for line1 in fileA:
new_list=line1.rstrip('\n').split('\t')
query=new_list[0]
subject=new_list[1]
dictA[query] = subject
dictB = dict()
for line1 in fileB:
new_list=line1.rstrip('\n').split('\t')
query=new_list[0]
subject=new_list[1]
dictB[query] = subject
SharedPairs ={}
NotSharedPairs ={}
for id1 in dictA.keys():
value1=dictA[id1]
if value1 in dictB.keys():
if id1 == dictB[value1]:
SharedPairs[value1] = id1
else:
NotSharedPairs[value1] = id1
for key in SharedPairs.keys():
ine = key +'\t' + SharedPairs[key]+'\n'
output.write(line)
for key in NotSharedPairs.keys():
line = key +'\t' + NotSharedPairs[key]+'\n'
output2.write(line)
Upvotes: 0
Views: 77
Reputation: 158
import csv
data = csv.reader(open('data.tsv'), delimiter='\t')
id_list = []
for item in data:
(x, y, val) = item
id_list.append((x, y, val))
file1 = [item for item in id_list if (item[1], item[0], item[2]) in id_list]
file2 = [item for item in id_list if (item[1], item[0], item[2]) not in id_list]
print file1
print file2
Upvotes: 1
Reputation: 31339
You can use set
s to solve it easily:
#!/usr/bin/env python
# ordered pairs (ID1, ID2)
oset = set()
# reversed pairs (ID2, ID1)
rset = set()
with open('input.txt') as f:
for line in f:
first, second, val = line.strip().split()
if first < second:
oset.add((first, second, val,))
else:
# note that this reverses second and first for matching purposes
rset.add((second, first, val,))
print "common: %s" % str(oset & rset)
print "diff: %s" % str(oset ^ rset)
Output:
common: set([('ID4', 'ID5', 'value'), ('ID1', 'ID2', 'value')])
diff: set([('ID3', 'ID6', 'value'), ('ID5', 'ID6', 'value'), ('ID2', 'ID7', 'value')])
It doesn't handle pairs with (ID1, ID1)
but you could add it to a third set and do what you decide with it.
Upvotes: 1