Reputation: 1
I want to count the total of letters (A...Z) in an alphabet but my code is not counting correctly.
from Bio import SeqIO
PSEQ=[repr(seq_record.seq)for seq_record in SeqIO.parse("data.txt","fasta")]
print(PSEQ)
print(len(PSEQ))
PSEQ_ID=[(seq_record.id)for seq_record in SeqIO.parse("data.txt","fasta")]
print(PSEQ_ID)
PSEQ_ID=([i for i in PSEQ_ID[0:]])
PSEQ=([i for i in PSEQ[0:]])
print(len(PSEQ[0:]))
A=[i.count("A")for i in PSEQ]
B=[i.count("B")for i in PSEQ]
C=[i.count("C")for i in PSEQ]
D=[i.count("D")for i in PSEQ]
E=[i.count("E")for i in PSEQ]
F=[i.count("F")for i in PSEQ]
G=[i.count("G")for i in PSEQ]
H=[i.count("H")for i in PSEQ]
I=[i.count("I")for i in PSEQ]
J=[i.count("J")for i in PSEQ]
K=[i.count("K")for i in PSEQ]
L=[i.count("L")for i in PSEQ]
M=[i.count("M")for i in PSEQ]
N=[i.count("N")for i in PSEQ]
O=[i.count("O")for i in PSEQ]
P=[i.count("P")for i in PSEQ]
Q=[i.count("Q")for i in PSEQ]
R=[i.count("R")for i in PSEQ]
S=[i.count("S")for i in PSEQ]
T=[i.count("T")for i in PSEQ]
U=[i.count("U")for i in PSEQ]
V=[i.count("V")for i in PSEQ]
W=[i.count("W")for i in PSEQ]
X=[i.count("X")for i in PSEQ]
Y=[i.count("Y")for i in PSEQ]
Z=[i.count("Z")for i in PSEQ]
All={"A":A,"B":B,"C":C,"D":D,"E":E,"F":F,"G":G,"H":H,
"I":I,"J":J,"k":K,"L":L,"M":M,"N":N,"O":O,"P":P,"Q":Q,
"R":R,"S":S,"T":T,"U":U,"V":V,"W":W,"X":X,"Y":Y,"Z":Z}
#print(All)
import pandas as pd
df=pd.DataFrame(All)
print(df)
Here is my problem, I want the result like this.
Because the letter of A in my file, is 7 times but here it's showed 4 times. I want the result A should be 7 times according to my file data.
Upvotes: 0
Views: 377
Reputation: 545776
Your code
Both can be fixed by using collections.Counter
to count items (in this case, letters). Then the entire code reduces to:
from Bio import SeqIO
from collections import Counter
import pandas as pd
frequencies = Counter()
for rec in SeqIO.parse(filename , 'fasta'):
frequencies.update(rec.seq)
df = pd.DataFrame.from_dict(frequencies, orient='index')
print(df)
This merges the counts for each sequence in the FASTA file. If you want to keep them separate, just maintain a dictionary/list of Counter
s, instead of a single Counter
.
Upvotes: 0
Reputation: 1672
read the fasta format with function and use count()
for counting the alphabet sequence.
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna, generic_protein
def read_fasta(fp):
name, seq = None, []
for line in fp:
line = line.rstrip()
if line.startswith(">"):
if name: yield (name, ''.join(seq))
name, seq = line, []
else:
seq.append(line)
if name: yield (name, ''.join(seq))
with open('protein.fasta') as fp:
for name, seq in read_fasta(fp):
print(seq.count("A"))
Upvotes: 1