Reputation: 3
I need to count a special pattern from a text file. My code is:
sequence = input ("Enter a sequence valid: ")
with open(original_file, 'r') as read_obj:
for line in read_obj:
count = 0
for i in range(len(line)):
if sequence.upper() == (line[i:i + len(sequence)].upper()):
count += 1
print(f"({count}) {line.upper()}", end=' ')
the outpu is:
Enter a sequence valid : ACA
(0) CATGTCGTAGCTAGCTACTGTACTATTATTATCTGGATCGTAC
(0) CTATGCGATGCTGACGTATCTAGCTACGTATCGTAGCTGATCTATCGATCGTATCGA
(0) CATGCTAGTCTAGCTAGCTAGCTAGCGTAGCTACTGAGTCGATC
(3) ACACACCCCACATTCTCGTACGATTTTCGGCGCGGGGCGGCCTATTATCTGCAT
(2) ACACAC
(0) TGTGTG
(15) ACACACACACACACACACACACACACACACAC
(1) TAGACAGTCGATCGACTGCAGCTTCG
(0) CCACCATGGGTGG
(0) AAAAATTTT
(0) GGGG
(0) AAAA
I need to count the total of finding pattern for each line for example in this case is 21 for ACA. My text file is:
CATGTCGTAGCTAGCTACTGTACTATTATTATCTGGATCGTAC
CTATGCGATGCTGACGTATCTAGCTACGTATCGTAGCTGATCTATCGATCGTATCGA
CATGCTAGTCTAGCTAGCTAGCTAGCGTAGCTACTGAGTCGATC
ACACACCCCACATTCTCGTACGATTTTCGGCGCGGGGCGGCCTATTATCTGCAT
ACACAC
TGTGTG
ACACACACACACACACACACACACACACACAC
TAGACAGTCGATCGACTGCAGCTTCG
CCACCATGGGTGG
AAAAATTTT
GGGG
AAAA
Upvotes: 0
Views: 48
Reputation: 14492
If you need to count all occurrences including the overlapping ones then you do something like this
s = """CATGTCGTAGCTAGCTACTGTACTATTATTATCTGGATCGTAC
CTATGCGATGCTGACGTATCTAGCTACGTATCGTAGCTGATCTATCGATCGTATCGA
CATGCTAGTCTAGCTAGCTAGCTAGCGTAGCTACTGAGTCGATC
ACACACCCCACATTCTCGTACGATTTTCGGCGCGGGGCGGCCTATTATCTGCAT
ACACAC
TGTGTG
ACACACACACACACACACACACACACACACAC
TAGACAGTCGATCGACTGCAGCTTCG
CCACCATGGGTGG
AAAAATTTT
GGGG
AAAA"""
def get_occurrences(s):
counter = 0
for i in range(len(s) - 3):
if s[i:i+3] == "ACA":
counter += 1
return counter
sum([get_occurrences(line) for line in s.split("\n")])
This will give you the desired 21
.
Upvotes: 1