Fabián Montero
Fabián Montero

Reputation: 1794

Python dictionary match any element of key

If one has a dict like the following:

aminoacids = {
    ("GAU", "GAC"): "Asp",
    ("GAA", "GAG"): "Glu",
    ("UGU", "UGC"): "Cys",
}

Is there a way to match any of the elements of the key to obtain the value?

I have tried simply using get:

aminoacids.get("GAA")

Which returns None, as it doesn't strictly match a whole key, although "GAA" is contained in the key that has a value of "Glu".

Is there a way to get the following behavior:

>>> aminoacids = {
    ("GAU", "GAC"): "Asp",
    ("GAA", "GAG"): "Glu",
    ("UGU", "UGC"): "Cys",
}

>>> aminoacids.someFunction("GAA")
>>> "Glu"

I've checked the docs but there doesn't seem to be a function for this.

Any ideas? Thanks!

Upvotes: 2

Views: 212

Answers (2)

Jamie Deith
Jamie Deith

Reputation: 714

If you're looking to make the setup a little more concise than the full 64-key dictionary, you could first set up a dictionary of lists (or tuples), one for each amino acid, and then build your base->amino lookup table along these lines:

from itertools import product
bases = 'UCAG'
amino_codes = dict(Asp=('GAU','GAC'),
                     Glu=('GAA', 'GAC'),
                     Leu=['CU'+b for b in bases] + ['UUA','UUG'],
                     Ser=['UC'+b for b in bases],  # list comprehensions just to show the concept
                     Stop=('UAA,UAG,UGA') # You do the rest
                     )

bases_to_amino = {}
for triplet in product(bases, repeat=3):
    base_key = ''.join(triplet)
    amino = next((a for (a, pat) in amino_codes.items() if base_key in pat), 'missing')
    bases_to_amino[base_key] = amino
    
from pprint import pprint
pprint(bases_to_amino)

{'AAA': 'missing',
 'AAC': 'missing',
 'AAG': 'missing',
 'AAU': 'missing',
 'ACA': 'missing',
 'ACC': 'missing',
 'ACG': 'missing',
 'ACU': 'missing',
 'AGA': 'missing',
 'AGC': 'missing',
 'AGG': 'missing',
 'AGU': 'missing',
 'AUA': 'missing',
 'AUC': 'missing',
 'AUG': 'missing',
 'AUU': 'missing',
 'CAA': 'missing',
 'CAC': 'missing',
 'CAG': 'missing',
 'CAU': 'missing',
 'CCA': 'missing',
 'CCC': 'missing',
 'CCG': 'missing',
 'CCU': 'missing',
 'CGA': 'missing',
 'CGC': 'missing',
 'CGG': 'missing',
 'CGU': 'missing',
 'CUA': 'Leu',
 'CUC': 'Leu',
 'CUG': 'Leu',
 'CUU': 'Leu',
 'GAA': 'Glu',
 'GAC': 'Asp',
 'GAG': 'missing',
 'GAU': 'Asp',
 'GCA': 'missing',
 'GCC': 'missing',
 'GCG': 'missing',
 'GCU': 'missing',
 'GGA': 'missing',
 'GGC': 'missing',
 'GGG': 'missing',
 'GGU': 'missing',
 'GUA': 'missing',
 'GUC': 'missing',
 'GUG': 'missing',
 'GUU': 'missing',
 'UAA': 'Stop',
 'UAC': 'missing',
 'UAG': 'Stop',
 'UAU': 'missing',
 'UCA': 'Ser',
 'UCC': 'Ser',
 'UCG': 'Ser',
 'UCU': 'Ser',
 'UGA': 'Stop',
 'UGC': 'missing',
 'UGG': 'missing',
 'UGU': 'missing',
 'UUA': 'Leu',
 'UUC': 'missing',
 'UUG': 'Leu',
 'UUU': 'missing'}

In this case you don't get all that much out of the exercise, but the concepts could be valuable for similar situations with more than 64 permutations to contend with.

Side notes:

  • a purist might not like the way I mixed tuples and lists in amino_codes. I'll accept this as a lesser crime.
  • amino = next(blah) is used to stop looking for a match once one is found.
  • you could compress the 5 last lines of code further with nested generator comprehension. The wisdom of doing this is a matter of taste

.

find_amino = lambda t: next((a for (a, pat) in amino_codes.items() if t in pat), 'missing')
bases_to_amino = dict((bk, find_amino(bk)) for bk in (''.join(t) for t in product(bases, repeat=3)))

Upvotes: 1

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95948

No there isn't. Instead, use an alternative form to you dictionary, it's ok to have duplicate values so:

amino acids ={
    "GUA": "Asp",
    "GAC": "Asp",
    "GAA": "Glu",
    "GAG": "Glu",
    "UGU": "Cys",
    "UGC": "Cys"
}

Upvotes: 4

Related Questions