Reputation: 278
i need to count string and Find the percentage of presence of some characters in string But My code doesn't work properly and shows the percentage line by line when I want the final result this is my code :
from os import cpu_count
import re
with open('ros_gc_948_1_dataset.txt', 'r') as fp:
# read all lines in a list
lines = fp.readlines()
c_count=0
g_count=0
a_count=0
t_count=0
for line in lines:
# check if string present on a current line
if re.findall(r"\bRos_[0-9][0-9][0-9][0-9]" , line):
line.write()
line.replace(">","")
print(line)
c_count=0
g_count=0
a_count=0
t_count=0
else:
c_count+=line.count("C")
g_count+=line.count("G")
a_count+=line.count("A")
t_count+=line.count("T")
tocag=c_count+g_count
toac=c_count + g_count + a_count + t_count
if toac !=0:
avrg=float(tocag/toac * 100)
print(avrg)
and the result :
>Ros_7657
53.333333333333336
43.333333333333336
47.22222222222222
48.333333333333336
49.0
49.72222222222222
49.76190476190476
48.541666666666664
48.333333333333336
48.833333333333336
49.24242424242424
49.30555555555556
48.717948717948715
49.25925925925926
>Ros_3487
53.333333333333336
47.5
46.666666666666664
48.333333333333336
47.0
46.94444444444444
48.095238095238095
47.291666666666664
46.111111111111114
47.333333333333336
46.96969696969697
46.52777777777778
46.666666666666664
46.785714285714285
47.368421052631575
My expected result :
Ros_7657
49.25925925925926`
Ros_3487
47.368421052631575
I have a text file containing several lines of these entries that I just want their names and percentages
ex input :
>Ros_2115
GAGGCAATGGTTATCAACCCCTGATTTACGAATGACCTAACAACTCCTTAGAATTTAATC
GTTATGTGAATTAAGCAACGCTCGCGAATTGCTATGTTAATTCGCACTGTAAGGTGTCGA
ACGAAATCCACTGTTCCTTTTCTAATTTCTTTCA
thanks for help me
Upvotes: 0
Views: 136
Reputation: 278
best solution
with open('rosalind_gc_948_1_dataset.txt', 'r') as fp:
lines = fp.readlines()
dict = dict()
key = None
for line in lines:
if line.startswith('>'):
dict[line[1:-1]] = ""
key = line[1:-1]
else:
dict[key] += line[:-1]
for key, value in dict.items():
dict[key] = (value.count("C") + value.count("G")) / len(value)*100
for key, value in dict.items():
print(key)
print("{:.6f}".format(value))
```
Upvotes: 0
Reputation: 3409
This should work:
...
toac = 0
# check if string present on a current line
if re.findall(r"\bRos_[0-9][0-9][0-9][0-9]" , line):
# print count from the last series
if toac !=0:
avrg=float(tocag/toac * 100)
print(avrg)
line.write()
...
And in the end, keep the if... print, unindenting it, for the last series. (Note that this block of code is repeated, which is not elegant, but for some reason I'm unable atm to find a more elegant solution...).
This seems better (provided all your genetic blocs start with ">Ros_nnnn"):
with open('ros_gc_948_1_dataset.txt', 'r') as fp:
# join all lines in one string, then split around ">"
lines = "".join(fp.readlines()).split(">")
for line in lines:
# skip empty lines
if not line.strip():
continue
# print the 1st 8 characters ('Ros_xxxx')
print(line[:8])
c_count = line.count("C")
g_count = line.count("G")
a_count = line.count("A")
t_count= line.count("T")
tocag=c_count+g_count
toac=c_count + g_count + a_count + t_count
if toac !=0:
avrg=float(tocag/toac * 100)
print(avrg)
Upvotes: 1
Reputation: 2302
Move the print
statement a bit to the left, so it's aligned with the beginning of the for
loop.
Also, print("{:.2f}%".format(avrg))
should truncate to 2 decimals. Change the 2 by whatever numbers you want.
Upvotes: 0