Reputation: 326
My code below gives me all possible combinations of DNA. Is there a more efficient, cleaner way to do this? Also, for any bioinformatics or biotech programmers, which modules should I become most familiar with?
DNA = 'a', 't', 'g', 'c'
lis = []
def all_combos():
for a in A:
for t in A:
for g in A:
for c in A:
lis.append([a, t, g, c])
return lis
print(all_combos())
Upvotes: 1
Views: 509
Reputation: 33
list comprehension:
proteins = ['a', 't', 'c', 'g']
all_combos = [x+y for x in proteins for y in proteins]
Upvotes: 0
Reputation: 10220
As a student's exercise, your code is readable and does what you want.
I guess the question is, why do you need all these combinations? Practical bioinformatics is, among other things, a mess of file types and formats, and you'll probably encounter some input data using a different alphabet than the one you're working with.
Regarding modules, there are two general-purpose I'll mention. The rest really depends on what specific task you're trying to accomplish. Biopython
is the more mature and widely supported, but the code base is it's showing it's age. scikit-bio
is the new kid on the block with beautiful, fully tested code, but with less features and less support for obscure file formats.
Upvotes: 1
Reputation: 476669
You can use itertools.product
to generate the list of all combinations. This will generate tuple
s instead of list
s, but I guess that's fine?
from itertools import product
lis = list(product('atgc',repeat=4))
Here 4
means you want to construct 4-tuples.
The algorithm is of course not faster - complexity-wise - than using the for
loops, since it is inherently O(mn) with m the number of elements (len('atgc')
) and n=4
(the number of elements per tuple). Both algorithms are in terms of big-oh equally fast (although there can be differences).
This yields:
>>> list(product('atgc',repeat=4))
[('a', 'a', 'a', 'a'), ('a', 'a', 'a', 't'), ('a', 'a', 'a', 'g'), ('a', 'a', 'a', 'c'), ('a', 'a', 't', 'a'), ('a', 'a', 't', 't'), ('a', 'a', 't', 'g'), ('a', 'a', 't', 'c'), ('a', 'a', 'g', 'a'), ('a', 'a', 'g', 't'), ('a', 'a', 'g', 'g'), ('a', 'a', 'g', 'c'), ('a', 'a', 'c', 'a'), ('a', 'a', 'c', 't'), ('a', 'a', 'c', 'g'), ('a', 'a', 'c', 'c'), ('a', 't', 'a', 'a'), ('a', 't', 'a', 't'), ('a', 't', 'a', 'g'), ('a', 't', 'a', 'c'), ('a', 't', 't', 'a'), ('a', 't', 't', 't'), ('a', 't', 't', 'g'), ('a', 't', 't', 'c'), ('a', 't', 'g', 'a'), ('a', 't', 'g', 't'), ('a', 't', 'g', 'g'), ('a', 't', 'g', 'c'), ('a', 't', 'c', 'a'), ('a', 't', 'c', 't'), ('a', 't', 'c', 'g'), ('a', 't', 'c', 'c'), ('a', 'g', 'a', 'a'), ('a', 'g', 'a', 't'), ('a', 'g', 'a', 'g'), ('a', 'g', 'a', 'c'), ('a', 'g', 't', 'a'), ('a', 'g', 't', 't'), ('a', 'g', 't', 'g'), ('a', 'g', 't', 'c'), ('a', 'g', 'g', 'a'), ('a', 'g', 'g', 't'), ('a', 'g', 'g', 'g'), ('a', 'g', 'g', 'c'), ('a', 'g', 'c', 'a'), ('a', 'g', 'c', 't'), ('a', 'g', 'c', 'g'), ('a', 'g', 'c', 'c'), ('a', 'c', 'a', 'a'), ('a', 'c', 'a', 't'), ('a', 'c', 'a', 'g'), ('a', 'c', 'a', 'c'), ('a', 'c', 't', 'a'), ('a', 'c', 't', 't'), ('a', 'c', 't', 'g'), ('a', 'c', 't', 'c'), ('a', 'c', 'g', 'a'), ('a', 'c', 'g', 't'), ('a', 'c', 'g', 'g'), ('a', 'c', 'g', 'c'), ('a', 'c', 'c', 'a'), ('a', 'c', 'c', 't'), ('a', 'c', 'c', 'g'), ('a', 'c', 'c', 'c'), ('t', 'a', 'a', 'a'), ('t', 'a', 'a', 't'), ('t', 'a', 'a', 'g'), ('t', 'a', 'a', 'c'), ('t', 'a', 't', 'a'), ('t', 'a', 't', 't'), ('t', 'a', 't', 'g'), ('t', 'a', 't', 'c'), ('t', 'a', 'g', 'a'), ('t', 'a', 'g', 't'), ('t', 'a', 'g', 'g'), ('t', 'a', 'g', 'c'), ('t', 'a', 'c', 'a'), ('t', 'a', 'c', 't'), ('t', 'a', 'c', 'g'), ('t', 'a', 'c', 'c'), ('t', 't', 'a', 'a'), ('t', 't', 'a', 't'), ('t', 't', 'a', 'g'), ('t', 't', 'a', 'c'), ('t', 't', 't', 'a'), ('t', 't', 't', 't'), ('t', 't', 't', 'g'), ('t', 't', 't', 'c'), ('t', 't', 'g', 'a'), ('t', 't', 'g', 't'), ('t', 't', 'g', 'g'), ('t', 't', 'g', 'c'), ('t', 't', 'c', 'a'), ('t', 't', 'c', 't'), ('t', 't', 'c', 'g'), ('t', 't', 'c', 'c'), ('t', 'g', 'a', 'a'), ('t', 'g', 'a', 't'), ('t', 'g', 'a', 'g'), ('t', 'g', 'a', 'c'), ('t', 'g', 't', 'a'), ('t', 'g', 't', 't'), ('t', 'g', 't', 'g'), ('t', 'g', 't', 'c'), ('t', 'g', 'g', 'a'), ('t', 'g', 'g', 't'), ('t', 'g', 'g', 'g'), ('t', 'g', 'g', 'c'), ('t', 'g', 'c', 'a'), ('t', 'g', 'c', 't'), ('t', 'g', 'c', 'g'), ('t', 'g', 'c', 'c'), ('t', 'c', 'a', 'a'), ('t', 'c', 'a', 't'), ('t', 'c', 'a', 'g'), ('t', 'c', 'a', 'c'), ('t', 'c', 't', 'a'), ('t', 'c', 't', 't'), ('t', 'c', 't', 'g'), ('t', 'c', 't', 'c'), ('t', 'c', 'g', 'a'), ('t', 'c', 'g', 't'), ('t', 'c', 'g', 'g'), ('t', 'c', 'g', 'c'), ('t', 'c', 'c', 'a'), ('t', 'c', 'c', 't'), ('t', 'c', 'c', 'g'), ('t', 'c', 'c', 'c'), ('g', 'a', 'a', 'a'), ('g', 'a', 'a', 't'), ('g', 'a', 'a', 'g'), ('g', 'a', 'a', 'c'), ('g', 'a', 't', 'a'), ('g', 'a', 't', 't'), ('g', 'a', 't', 'g'), ('g', 'a', 't', 'c'), ('g', 'a', 'g', 'a'), ('g', 'a', 'g', 't'), ('g', 'a', 'g', 'g'), ('g', 'a', 'g', 'c'), ('g', 'a', 'c', 'a'), ('g', 'a', 'c', 't'), ('g', 'a', 'c', 'g'), ('g', 'a', 'c', 'c'), ('g', 't', 'a', 'a'), ('g', 't', 'a', 't'), ('g', 't', 'a', 'g'), ('g', 't', 'a', 'c'), ('g', 't', 't', 'a'), ('g', 't', 't', 't'), ('g', 't', 't', 'g'), ('g', 't', 't', 'c'), ('g', 't', 'g', 'a'), ('g', 't', 'g', 't'), ('g', 't', 'g', 'g'), ('g', 't', 'g', 'c'), ('g', 't', 'c', 'a'), ('g', 't', 'c', 't'), ('g', 't', 'c', 'g'), ('g', 't', 'c', 'c'), ('g', 'g', 'a', 'a'), ('g', 'g', 'a', 't'), ('g', 'g', 'a', 'g'), ('g', 'g', 'a', 'c'), ('g', 'g', 't', 'a'), ('g', 'g', 't', 't'), ('g', 'g', 't', 'g'), ('g', 'g', 't', 'c'), ('g', 'g', 'g', 'a'), ('g', 'g', 'g', 't'), ('g', 'g', 'g', 'g'), ('g', 'g', 'g', 'c'), ('g', 'g', 'c', 'a'), ('g', 'g', 'c', 't'), ('g', 'g', 'c', 'g'), ('g', 'g', 'c', 'c'), ('g', 'c', 'a', 'a'), ('g', 'c', 'a', 't'), ('g', 'c', 'a', 'g'), ('g', 'c', 'a', 'c'), ('g', 'c', 't', 'a'), ('g', 'c', 't', 't'), ('g', 'c', 't', 'g'), ('g', 'c', 't', 'c'), ('g', 'c', 'g', 'a'), ('g', 'c', 'g', 't'), ('g', 'c', 'g', 'g'), ('g', 'c', 'g', 'c'), ('g', 'c', 'c', 'a'), ('g', 'c', 'c', 't'), ('g', 'c', 'c', 'g'), ('g', 'c', 'c', 'c'), ('c', 'a', 'a', 'a'), ('c', 'a', 'a', 't'), ('c', 'a', 'a', 'g'), ('c', 'a', 'a', 'c'), ('c', 'a', 't', 'a'), ('c', 'a', 't', 't'), ('c', 'a', 't', 'g'), ('c', 'a', 't', 'c'), ('c', 'a', 'g', 'a'), ('c', 'a', 'g', 't'), ('c', 'a', 'g', 'g'), ('c', 'a', 'g', 'c'), ('c', 'a', 'c', 'a'), ('c', 'a', 'c', 't'), ('c', 'a', 'c', 'g'), ('c', 'a', 'c', 'c'), ('c', 't', 'a', 'a'), ('c', 't', 'a', 't'), ('c', 't', 'a', 'g'), ('c', 't', 'a', 'c'), ('c', 't', 't', 'a'), ('c', 't', 't', 't'), ('c', 't', 't', 'g'), ('c', 't', 't', 'c'), ('c', 't', 'g', 'a'), ('c', 't', 'g', 't'), ('c', 't', 'g', 'g'), ('c', 't', 'g', 'c'), ('c', 't', 'c', 'a'), ('c', 't', 'c', 't'), ('c', 't', 'c', 'g'), ('c', 't', 'c', 'c'), ('c', 'g', 'a', 'a'), ('c', 'g', 'a', 't'), ('c', 'g', 'a', 'g'), ('c', 'g', 'a', 'c'), ('c', 'g', 't', 'a'), ('c', 'g', 't', 't'), ('c', 'g', 't', 'g'), ('c', 'g', 't', 'c'), ('c', 'g', 'g', 'a'), ('c', 'g', 'g', 't'), ('c', 'g', 'g', 'g'), ('c', 'g', 'g', 'c'), ('c', 'g', 'c', 'a'), ('c', 'g', 'c', 't'), ('c', 'g', 'c', 'g'), ('c', 'g', 'c', 'c'), ('c', 'c', 'a', 'a'), ('c', 'c', 'a', 't'), ('c', 'c', 'a', 'g'), ('c', 'c', 'a', 'c'), ('c', 'c', 't', 'a'), ('c', 'c', 't', 't'), ('c', 'c', 't', 'g'), ('c', 'c', 't', 'c'), ('c', 'c', 'g', 'a'), ('c', 'c', 'g', 't'), ('c', 'c', 'g', 'g'), ('c', 'c', 'g', 'c'), ('c', 'c', 'c', 'a'), ('c', 'c', 'c', 't'), ('c', 'c', 'c', 'g'), ('c', 'c', 'c', 'c')]
Mind that itertools
usually work lazily: they do return an iterator. Since O(mn) usually blows up fast, it can therefore be useful to use a generator instead of constructing a list: in that case at least you save on memory. Furthermore if n is large (like 16 or larger for m=4), usually a computer will start having difficulty processing the elements.
Upvotes: 5
Reputation: 891
Guess I was beaten while trying this out .. will leave it up just for my statistics.
If what you want is actually all possible permutations (i.e aaaa, aaat, aaag, aaac... ), you can use itertools this way:
from itertools import product
print(list(product('atgc', repeat=4)))
Upvotes: 2
Reputation: 4455
There is a python function for generating combinations from a list:
itertools.combinations
A person was trying to list all combinations of a list taken two at a time in this post: Python - list the combination pair for a function value
Upvotes: 1