anne
anne

Reputation: 411

Making a dictionary of overlaps from a dictionary

This problem is teasing me:

I have 6 different sequences that each overlap, they are name 1-6. I have made a function that represents the sequences in a dictionary, and a function that gives me the part of the sequences that overlap.

Now i should use those 2 functions to construct a dictionary that take the number of overlapping positions in both the right-to-left order and in the left-to-right oder.

The dictionary I have made look like:

{'1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
 '2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
 '3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
 '4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
 '5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
 '6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}

I should end up with a result like:

{'1': {'3': 0, '2': 1, '5': 1, '4': 0, '6': 29},
'3': {'1': 0, '2': 0, '5': 0, '4': 1, '6': 1},
'2': {'1': 13, '3': 1, '5': 21, '4': 0, '6': 0},
'5': {'1': 39, '3': 0, '2': 1, '4': 0, '6': 14},
'4': {'1': 1, '3': 1, '2': 17, '5': 2, '6': 0},
'6': {'1': 0, '3': 43, '2': 0, '5': 0, '4': 1}}

I seems impossible. I guess it's not, so if somebody could (not do it) but push me in the right direction, it would be great.

Upvotes: 0

Views: 173

Answers (2)

Kyle Strand
Kyle Strand

Reputation: 16499

This is a bit of a complicated one-liner, but it should work. Using find_overlaps() as the function that finds overlaps and seq_dict as the original dictionary of sequences:

overlaps = {seq:{other_seq:find_overlaps(seq_dict[seq],seq_dict[other_seq])
    for other_seq in seq_dict if other_seq != seq} for seq in seq_dict}

Here it is with a bit nicer spacing:

overlaps = \
{seq:
    {other_seq:
        find_overlaps(seq_dict[seq],seq_dict[other_seq])
    for other_seq in seq_dict if other_seq != seq}
for seq in seq_dict}

Upvotes: 2

lynn
lynn

Reputation: 10794

The clean way:

dna = {
    '1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
    '2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
    '3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
    '4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
    '5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
    '6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTG' \
         'TTTTTTTCTTCCAAGAGGTCGGAGT'
}

def overlap(a, b):
    l = min(len(a), len(b))
    while True:
        if a[-l:] == b[:l] or l == 0:
            return l
        l -= 1

def all_overlaps(d):
    result = {}
    for k1, v1 in d.items():
        overlaps = {}
        for k2, v2 in d.items():
            if k1 == k2:
                continue
            overlaps[k2] = overlap(v1, v2)
        result[k1] = overlaps
    return result

print all_overlaps(dna)

(By the way, you could've provided overlap yourself in the question to make it easier for everyone to answer.)

Upvotes: 1

Related Questions