bnicholl
bnicholl

Reputation: 326

return all possible combos of DNA

My code below gives me all possible combinations of DNA. Is there a more efficient, cleaner way to do this? Also, for any bioinformatics or biotech programmers, which modules should I become most familiar with?

DNA = 'a', 't', 'g', 'c'


lis = []
def all_combos():
    for a in A:
        for t in A:
            for g in A:
                for c in A:  
                    lis.append([a, t, g, c])
    return lis

print(all_combos())

Upvotes: 1

Views: 509

Answers (5)

Leo blubla
Leo blubla

Reputation: 33

list comprehension:

proteins = ['a', 't', 'c', 'g']
all_combos = [x+y for x in proteins for y in proteins]

Upvotes: 0

Erik Cederstrand
Erik Cederstrand

Reputation: 10220

As a student's exercise, your code is readable and does what you want.

I guess the question is, why do you need all these combinations? Practical bioinformatics is, among other things, a mess of file types and formats, and you'll probably encounter some input data using a different alphabet than the one you're working with.

Regarding modules, there are two general-purpose I'll mention. The rest really depends on what specific task you're trying to accomplish. Biopython is the more mature and widely supported, but the code base is it's showing it's age. scikit-bio is the new kid on the block with beautiful, fully tested code, but with less features and less support for obscure file formats.

Upvotes: 1

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476669

You can use itertools.product to generate the list of all combinations. This will generate tuples instead of lists, but I guess that's fine?

from itertools import product

lis = list(product('atgc',repeat=4))

Here 4 means you want to construct 4-tuples.

The algorithm is of course not faster - complexity-wise - than using the for loops, since it is inherently O(mn) with m the number of elements (len('atgc')) and n=4 (the number of elements per tuple). Both algorithms are in terms of big-oh equally fast (although there can be differences).

This yields:

>>> list(product('atgc',repeat=4))
[('a', 'a', 'a', 'a'), ('a', 'a', 'a', 't'), ('a', 'a', 'a', 'g'), ('a', 'a', 'a', 'c'), ('a', 'a', 't', 'a'), ('a', 'a', 't', 't'), ('a', 'a', 't', 'g'), ('a', 'a', 't', 'c'), ('a', 'a', 'g', 'a'), ('a', 'a', 'g', 't'), ('a', 'a', 'g', 'g'), ('a', 'a', 'g', 'c'), ('a', 'a', 'c', 'a'), ('a', 'a', 'c', 't'), ('a', 'a', 'c', 'g'), ('a', 'a', 'c', 'c'), ('a', 't', 'a', 'a'), ('a', 't', 'a', 't'), ('a', 't', 'a', 'g'), ('a', 't', 'a', 'c'), ('a', 't', 't', 'a'), ('a', 't', 't', 't'), ('a', 't', 't', 'g'), ('a', 't', 't', 'c'), ('a', 't', 'g', 'a'), ('a', 't', 'g', 't'), ('a', 't', 'g', 'g'), ('a', 't', 'g', 'c'), ('a', 't', 'c', 'a'), ('a', 't', 'c', 't'), ('a', 't', 'c', 'g'), ('a', 't', 'c', 'c'), ('a', 'g', 'a', 'a'), ('a', 'g', 'a', 't'), ('a', 'g', 'a', 'g'), ('a', 'g', 'a', 'c'), ('a', 'g', 't', 'a'), ('a', 'g', 't', 't'), ('a', 'g', 't', 'g'), ('a', 'g', 't', 'c'), ('a', 'g', 'g', 'a'), ('a', 'g', 'g', 't'), ('a', 'g', 'g', 'g'), ('a', 'g', 'g', 'c'), ('a', 'g', 'c', 'a'), ('a', 'g', 'c', 't'), ('a', 'g', 'c', 'g'), ('a', 'g', 'c', 'c'), ('a', 'c', 'a', 'a'), ('a', 'c', 'a', 't'), ('a', 'c', 'a', 'g'), ('a', 'c', 'a', 'c'), ('a', 'c', 't', 'a'), ('a', 'c', 't', 't'), ('a', 'c', 't', 'g'), ('a', 'c', 't', 'c'), ('a', 'c', 'g', 'a'), ('a', 'c', 'g', 't'), ('a', 'c', 'g', 'g'), ('a', 'c', 'g', 'c'), ('a', 'c', 'c', 'a'), ('a', 'c', 'c', 't'), ('a', 'c', 'c', 'g'), ('a', 'c', 'c', 'c'), ('t', 'a', 'a', 'a'), ('t', 'a', 'a', 't'), ('t', 'a', 'a', 'g'), ('t', 'a', 'a', 'c'), ('t', 'a', 't', 'a'), ('t', 'a', 't', 't'), ('t', 'a', 't', 'g'), ('t', 'a', 't', 'c'), ('t', 'a', 'g', 'a'), ('t', 'a', 'g', 't'), ('t', 'a', 'g', 'g'), ('t', 'a', 'g', 'c'), ('t', 'a', 'c', 'a'), ('t', 'a', 'c', 't'), ('t', 'a', 'c', 'g'), ('t', 'a', 'c', 'c'), ('t', 't', 'a', 'a'), ('t', 't', 'a', 't'), ('t', 't', 'a', 'g'), ('t', 't', 'a', 'c'), ('t', 't', 't', 'a'), ('t', 't', 't', 't'), ('t', 't', 't', 'g'), ('t', 't', 't', 'c'), ('t', 't', 'g', 'a'), ('t', 't', 'g', 't'), ('t', 't', 'g', 'g'), ('t', 't', 'g', 'c'), ('t', 't', 'c', 'a'), ('t', 't', 'c', 't'), ('t', 't', 'c', 'g'), ('t', 't', 'c', 'c'), ('t', 'g', 'a', 'a'), ('t', 'g', 'a', 't'), ('t', 'g', 'a', 'g'), ('t', 'g', 'a', 'c'), ('t', 'g', 't', 'a'), ('t', 'g', 't', 't'), ('t', 'g', 't', 'g'), ('t', 'g', 't', 'c'), ('t', 'g', 'g', 'a'), ('t', 'g', 'g', 't'), ('t', 'g', 'g', 'g'), ('t', 'g', 'g', 'c'), ('t', 'g', 'c', 'a'), ('t', 'g', 'c', 't'), ('t', 'g', 'c', 'g'), ('t', 'g', 'c', 'c'), ('t', 'c', 'a', 'a'), ('t', 'c', 'a', 't'), ('t', 'c', 'a', 'g'), ('t', 'c', 'a', 'c'), ('t', 'c', 't', 'a'), ('t', 'c', 't', 't'), ('t', 'c', 't', 'g'), ('t', 'c', 't', 'c'), ('t', 'c', 'g', 'a'), ('t', 'c', 'g', 't'), ('t', 'c', 'g', 'g'), ('t', 'c', 'g', 'c'), ('t', 'c', 'c', 'a'), ('t', 'c', 'c', 't'), ('t', 'c', 'c', 'g'), ('t', 'c', 'c', 'c'), ('g', 'a', 'a', 'a'), ('g', 'a', 'a', 't'), ('g', 'a', 'a', 'g'), ('g', 'a', 'a', 'c'), ('g', 'a', 't', 'a'), ('g', 'a', 't', 't'), ('g', 'a', 't', 'g'), ('g', 'a', 't', 'c'), ('g', 'a', 'g', 'a'), ('g', 'a', 'g', 't'), ('g', 'a', 'g', 'g'), ('g', 'a', 'g', 'c'), ('g', 'a', 'c', 'a'), ('g', 'a', 'c', 't'), ('g', 'a', 'c', 'g'), ('g', 'a', 'c', 'c'), ('g', 't', 'a', 'a'), ('g', 't', 'a', 't'), ('g', 't', 'a', 'g'), ('g', 't', 'a', 'c'), ('g', 't', 't', 'a'), ('g', 't', 't', 't'), ('g', 't', 't', 'g'), ('g', 't', 't', 'c'), ('g', 't', 'g', 'a'), ('g', 't', 'g', 't'), ('g', 't', 'g', 'g'), ('g', 't', 'g', 'c'), ('g', 't', 'c', 'a'), ('g', 't', 'c', 't'), ('g', 't', 'c', 'g'), ('g', 't', 'c', 'c'), ('g', 'g', 'a', 'a'), ('g', 'g', 'a', 't'), ('g', 'g', 'a', 'g'), ('g', 'g', 'a', 'c'), ('g', 'g', 't', 'a'), ('g', 'g', 't', 't'), ('g', 'g', 't', 'g'), ('g', 'g', 't', 'c'), ('g', 'g', 'g', 'a'), ('g', 'g', 'g', 't'), ('g', 'g', 'g', 'g'), ('g', 'g', 'g', 'c'), ('g', 'g', 'c', 'a'), ('g', 'g', 'c', 't'), ('g', 'g', 'c', 'g'), ('g', 'g', 'c', 'c'), ('g', 'c', 'a', 'a'), ('g', 'c', 'a', 't'), ('g', 'c', 'a', 'g'), ('g', 'c', 'a', 'c'), ('g', 'c', 't', 'a'), ('g', 'c', 't', 't'), ('g', 'c', 't', 'g'), ('g', 'c', 't', 'c'), ('g', 'c', 'g', 'a'), ('g', 'c', 'g', 't'), ('g', 'c', 'g', 'g'), ('g', 'c', 'g', 'c'), ('g', 'c', 'c', 'a'), ('g', 'c', 'c', 't'), ('g', 'c', 'c', 'g'), ('g', 'c', 'c', 'c'), ('c', 'a', 'a', 'a'), ('c', 'a', 'a', 't'), ('c', 'a', 'a', 'g'), ('c', 'a', 'a', 'c'), ('c', 'a', 't', 'a'), ('c', 'a', 't', 't'), ('c', 'a', 't', 'g'), ('c', 'a', 't', 'c'), ('c', 'a', 'g', 'a'), ('c', 'a', 'g', 't'), ('c', 'a', 'g', 'g'), ('c', 'a', 'g', 'c'), ('c', 'a', 'c', 'a'), ('c', 'a', 'c', 't'), ('c', 'a', 'c', 'g'), ('c', 'a', 'c', 'c'), ('c', 't', 'a', 'a'), ('c', 't', 'a', 't'), ('c', 't', 'a', 'g'), ('c', 't', 'a', 'c'), ('c', 't', 't', 'a'), ('c', 't', 't', 't'), ('c', 't', 't', 'g'), ('c', 't', 't', 'c'), ('c', 't', 'g', 'a'), ('c', 't', 'g', 't'), ('c', 't', 'g', 'g'), ('c', 't', 'g', 'c'), ('c', 't', 'c', 'a'), ('c', 't', 'c', 't'), ('c', 't', 'c', 'g'), ('c', 't', 'c', 'c'), ('c', 'g', 'a', 'a'), ('c', 'g', 'a', 't'), ('c', 'g', 'a', 'g'), ('c', 'g', 'a', 'c'), ('c', 'g', 't', 'a'), ('c', 'g', 't', 't'), ('c', 'g', 't', 'g'), ('c', 'g', 't', 'c'), ('c', 'g', 'g', 'a'), ('c', 'g', 'g', 't'), ('c', 'g', 'g', 'g'), ('c', 'g', 'g', 'c'), ('c', 'g', 'c', 'a'), ('c', 'g', 'c', 't'), ('c', 'g', 'c', 'g'), ('c', 'g', 'c', 'c'), ('c', 'c', 'a', 'a'), ('c', 'c', 'a', 't'), ('c', 'c', 'a', 'g'), ('c', 'c', 'a', 'c'), ('c', 'c', 't', 'a'), ('c', 'c', 't', 't'), ('c', 'c', 't', 'g'), ('c', 'c', 't', 'c'), ('c', 'c', 'g', 'a'), ('c', 'c', 'g', 't'), ('c', 'c', 'g', 'g'), ('c', 'c', 'g', 'c'), ('c', 'c', 'c', 'a'), ('c', 'c', 'c', 't'), ('c', 'c', 'c', 'g'), ('c', 'c', 'c', 'c')]

Mind that itertools usually work lazily: they do return an iterator. Since O(mn) usually blows up fast, it can therefore be useful to use a generator instead of constructing a list: in that case at least you save on memory. Furthermore if n is large (like 16 or larger for m=4), usually a computer will start having difficulty processing the elements.

Upvotes: 5

nick
nick

Reputation: 891

Guess I was beaten while trying this out .. will leave it up just for my statistics.

If what you want is actually all possible permutations (i.e aaaa, aaat, aaag, aaac... ), you can use itertools this way:

from itertools import product

print(list(product('atgc', repeat=4)))

Upvotes: 2

Mark
Mark

Reputation: 4455

There is a python function for generating combinations from a list:

itertools.combinations

A person was trying to list all combinations of a list taken two at a time in this post: Python - list the combination pair for a function value

Upvotes: 1

Related Questions