Rspacer
Rspacer

Reputation: 2429

How to compare non-identical lists and derive values from a dictionary in Python?

Here is a dictionary key that stores the value of amino acid (single alphabets) for each of the codons (Triplet bases like ATG, GCT etc).

aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}

As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G) ). These are Synonymous (DSyn) and if codons code for different amino acids they are Non-Synonymous (DNonsyn)

This is an extension of this question, if anyone is interested.

I have the following sequences:

list1 = ['ACT','ACT','nonsyn','G','L']

list2 = ['ACT','ACC','GGT','ATT']

Here, - list1 is derived from a previous calculation, such that it is a combination of bases, aminoacids (single lettered entries) and nonsyn (null). - list2 is a list containing triplet codons.

In this code, I need to compare list1 and list2. Each element in list1 must only be compared with the corresponding element list2 to do the following:

  1. If codon bases are present in both lists then compare the bases: a. If bases are identical (eg. ACT, ACT) then do nothing. b. If bases are non-identical (eg. ACT, ACC) then look up the amino acid in the dictionary. If the aminoacid is the same then increase countDsyn by 1 and if they are not the same increase countDnonsyn by 1

  2. If 'nonsyn' in list1 is compared to list2, do nothing.

  3. If aminoacid from list1 is compared to list2: Look up corresponding amino acid for list2 from aminoacid dictionary. a. If amino acids are identical then increment countDsyn by 1 b. If amino acids are identical then increment countDnonsyn by 1

Final OutPut for the given case:

Dsyn = 2 Dnonsyn = 1

NEED HELP to check if the way I am calling the values from dictionary is correct when comparing the if loops

Code Attempted:

countDsyn = 0
countDnonsyn = 0

for pos1,value1 in enumerate(list1):
    for pos2,value2 in enumerate(list2):
        if value1 in list1 = combination(ATGC,3): #eg. ACT,AGT,TTT etc. There are can be 64 such combinations
            if value1 in list1 == value2 in list2: #eg. ACT, ACT
                #Do nothing
            if value1 in list1 != value1 in list2: #eg. ACT,ACC
                if value1[aminoacid] == value2[aminoacid]:
                    countDsyn =+1
                else:
                    countDnonsyn =+1
        if value1 in list1 = "nonsyn":
            #Do nothing
        if value1 in list1 = (A-Z): #eg. 'G''L' etc.
            if value1 == value2[aminoacid] #eg. comparing 'G' and the aminoacid value of GTT from the dictionary
                countDsyn =+ 1
            if value1 != value2[aminoacid]:
                countDnonsyn =+1

Upvotes: 2

Views: 144

Answers (3)

Ben
Ben

Reputation: 6767

You need something like this:

for value1, value2 in zip(list1, list2):
    # Condition 2 in your question
    if value1 == 'nonsyn':
        continue
    # Condition 1 in your question
    if value1 in aminoacid.keys():
        if value1 == value2:
            continue
        elif aminoacid[value1] == aminoacid[value2]:
            countDsyn += 1
        else:
            countDnonsyn += 1
    # Condition 3 in your question
    else:
        if aminoacid[value2] == value1:
            countDsyn += 1
        else:
            countDnonsyn += 1

Upvotes: 3

CDe
CDe

Reputation: 181

Answer by @Ben is pretty good. Just two more things. Your aminoacid dictionary has a mistake. One of the glycin codons is CGT there, should be GGT. Also instead of the condition

if len(value1) == 3:

you might want to use

if value1 in aminoacid.keys():

which directly tests if your triplet is in the set of allowed codons. The upper variant also would be true if you have '7&D' as triplet there.

Upvotes: -1

Rok Povsic
Rok Povsic

Reputation: 4925

Try something like this:

cleared_list1 = [x for x in list1 if x != "nonsyn"]
cleared_list2 = [x for x in list2 if x != "nonsyn"]

cur_position = 0
for element1, element2 in zip(cleared_list1, cleared_list2):
    # Compare elements and update counters here.

    cur_position += 1

Using zip will iterate elements of two lists sequentially.

Upvotes: 1

Related Questions