Reputation: 689
I have text file as follows...
s1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD
s2
MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGVIKQGWLHKANVNST
. . .
I wanted to count letter 'P' in each sequences output should be
> s1:10
> s2:20
To acheive this python script as follows
infile=open("file1.txt",'r')
out=open("file2.csv",'w')
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name=line
else:
pattern = line.count('P')
print '%s:%s' %(name,pattern)
out.write('%s:%s\n' %(name,pattern))
it read line and gives result as follows
> s1:2
> s1:3
> s1:5
> s2:10
> s2:10
But i except out put as follows
> s1:10
> s2:20 . . .
Can any body help how to do this...
Thanks in Advance Ni
Upvotes: 1
Views: 309
Reputation: 70
total = 0
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name = line
else:
pattern = line.count('P')
total += pattern
print '%s:%s' %(name,pattern)
#this goes outside the for loop
out.write('%s:%s\n' %(name,total))
Upvotes: 1
Reputation: 1742
Don't parse the file line by line. Just iterate over the entire file character by character counting occurrances of the character you are looking for.
Upvotes: 1