Reputation: 833
I'm working on a simple nucleotide counter in python 2.7 and at one of the methods I wrote I'd like to print the g,c,a,t values sorted by the number of times they show up in the gene sheet. what could be the better way to do that? Thanks in advance!
def counting(self):
gene = open("BRCA1.txt", "r")
g = 0
a = 0
c = 0
t = 0
gene.readline()
for line in gene:
line = line.lower()
for char in line:
if char == "g":
g += 1
if char == "a":
a += 1
if char == "t":
t += 1
if char == "c":
c += 1
print "number of g\'s: %r" % str(g)
print "number of c\'s: %r" % str(c)
print "number of d\'s: %r" % str(a)
print "number of t\'s: %r" % str(t)
Upvotes: 1
Views: 92
Reputation: 37319
Use the collections.Counter
class.
from collections import Counter
def counting(self):
with open("BRCA1.txt", "r") as gene:
nucleotide_counts = Counter(char for line in gene for char in line.lower().strip())
for (nucleotide, count) in nucleotide_counts.most_common():
print "number of %s's: %d" % (nucleotide, count)
If your lines might contain things besides nucleotides, this should work:
from collections import Counter
def counting(self):
nucleotides = frozenset(('g', 'a', 't', 'c'))
with open("BRCA1.txt", "r") as gene:
nucleotide_counts = Counter(char for line in gene for char in line.lower() if char in nucleotides)
for (nucleotide, count) in nucleotide_counts.most_common():
print "number of %s's: %d" % (nucleotide, count)
That version doesn't need strip
because newlines and other whitespace would be excluded by checking set membership.
Upvotes: 4