Reputation: 411
This problem is teasing me:
I have 6 different sequences that each overlap, they are name 1-6. I have made a function that represents the sequences in a dictionary, and a function that gives me the part of the sequences that overlap.
Now i should use those 2 functions to construct a dictionary that take the number of overlapping positions in both the right-to-left order and in the left-to-right oder.
The dictionary I have made look like:
{'1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}
I should end up with a result like:
{'1': {'3': 0, '2': 1, '5': 1, '4': 0, '6': 29},
'3': {'1': 0, '2': 0, '5': 0, '4': 1, '6': 1},
'2': {'1': 13, '3': 1, '5': 21, '4': 0, '6': 0},
'5': {'1': 39, '3': 0, '2': 1, '4': 0, '6': 14},
'4': {'1': 1, '3': 1, '2': 17, '5': 2, '6': 0},
'6': {'1': 0, '3': 43, '2': 0, '5': 0, '4': 1}}
I seems impossible. I guess it's not, so if somebody could (not do it) but push me in the right direction, it would be great.
Upvotes: 0
Views: 173
Reputation: 16499
This is a bit of a complicated one-liner, but it should work. Using find_overlaps()
as the function that finds overlaps and seq_dict
as the original dictionary of sequences:
overlaps = {seq:{other_seq:find_overlaps(seq_dict[seq],seq_dict[other_seq])
for other_seq in seq_dict if other_seq != seq} for seq in seq_dict}
Here it is with a bit nicer spacing:
overlaps = \
{seq:
{other_seq:
find_overlaps(seq_dict[seq],seq_dict[other_seq])
for other_seq in seq_dict if other_seq != seq}
for seq in seq_dict}
Upvotes: 2
Reputation: 10794
The clean way:
dna = {
'1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTG' \
'TTTTTTTCTTCCAAGAGGTCGGAGT'
}
def overlap(a, b):
l = min(len(a), len(b))
while True:
if a[-l:] == b[:l] or l == 0:
return l
l -= 1
def all_overlaps(d):
result = {}
for k1, v1 in d.items():
overlaps = {}
for k2, v2 in d.items():
if k1 == k2:
continue
overlaps[k2] = overlap(v1, v2)
result[k1] = overlaps
return result
print all_overlaps(dna)
(By the way, you could've provided overlap
yourself in the question to make it easier for everyone to answer.)
Upvotes: 1