Reputation: 171
I'm trying to calculate the GC content (in %) of a DNA sequence for a Rosalind question. I have the following code, but it returns 0, or only the number of G's alone or C's alone (no percentage).
x = raw_input("Sequence?:").upper()
total = len(x)
c = x.count("C")
g = x.count("G")
gc_total = g+c
gc_content = gc_total/total
print gc_content
I also tried this, just to get a count of G's and C's, and not the percentage, but it just returns a count of the entire string:
x = raw_input("Sequence?:").upper()
def gc(n):
count = 0
for i in n:
if i == "C" or "G":
count = count + 1
else:
count = count
return count
gc(x)
EDIT: I fixed the typo in the print statement in the first example of code. That wasn't the problem, I just pasted the wrong snippet of code (there were many attempts...)
Upvotes: 2
Views: 28330
Reputation: 1
This may be helpful
import random
dna=''.join(random.choice('ATGCN') for i in range(2048))
print(dna)
print("A count",round((dna.count("A")/2048)*100),"%")
print("T count",round((dna.count("T")/2048)*100),"%")
print("G count",round((dna.count("G")/2048)*100),"%")
print("C count",round((dna.count("C")/2048)*100),"%")
print("AT count",round((dna.count("AT")/2048)*100),"%")
print("GC count",round((dna.count("GC")/2048)*100),"%")
Upvotes: 0
Reputation: 653
Maybe too late but it is better using Bio
#!/usr/bin/env python
import sys
from Bio import SeqIO
filename=sys.argv[1]
fh= open(filename,'r')
parser = SeqIO.parse(fh, "fasta")
for record in parser:
c=0
a=0
g=0
t=0
for x in str(record.seq):
if "C" in x:
c+=1
elif "G" in x:
g+=1
elif "A" in x:
a+=1
elif "T" in x:
t+=1
gc_content=(g+c)*100/(a+t+g+c)
print "%s\t%.2f" % (filename, gc_content)
Upvotes: 0
Reputation: 1
import sys
orignfile = sys.argv[1]
outfile = sys.argv[2]
sequence = ""
with open(orignfile, 'r') as f:
for line in f:
if line.startswith('>'):
seq_id = line.rstrip()[0:]
else:
sequence += line.rstrip()
GC_content = float((sequence.count('G') + sequence.count('C'))) / len(sequence) * 100
with open(outfile, 'a') as file_out:
file_out.write("The GC content of '%s' is\t %.2f%%" % (seq_id, GC_content))
Upvotes: 0
Reputation: 51
#This works for me.
import sys
filename=sys.argv[1]
fh=open(filename,'r')
file=fh.read()
x=file
c=0
a=0
g=0
t=0
for x in file:
if "C" in x:
c+=1
elif "G" in x:
g+=1
elif "A" in x:
a+=1
elif "T" in x:
t+=1
print "C=%d, G=%d, A=%d, T=%d" %(c,g,a,t)
gc_content=(g+c)*100/(a+t+g+c)
print "gc_content= %f" %(gc_content)
Upvotes: 0
Reputation: 1521
You also need to multiply the answer by 100 to convert it to a percentage.
Upvotes: 0
Reputation: 82
Shouldn't:
print cg_content
read
print gc_content?
As for the other snippet of code, your loop says
if i == "C" or "G":
This is evaluating "G" to true every time and thus running the if statement as true.
Instead, it should read
if i == "C" or i=="G":
Also, you don't need that else statement.
Hope this helps. Let us know how it goes.
Abdul Sattar
Upvotes: 1
Reputation: 1736
Your problem is that you are performming integer division, not floating point division.
Try
gc_content = gc_total / float(total)
Upvotes: 5