Reputation: 55
In the attached code, the for loop for m in re.finditer(match, query):
only executes on the last set of data in my data file.
I'm using Python 3.6.5.
I have placed print statements before, within and after the for loop, and the within also only executes on the last data set.
# Example: Finding a Motif in DNA
# The indexing in this problem starts from 1!
import re
def process_data(dataFile, outFile):
with open(dataFile) as data:
line = data.readline()
numTests = int(line)
print ("Number of tests: ", numTests)
while True:
line = data.readline()
if not line:
break
query = line
print ("Query: ", query)
line = data.readline()
match = line
print ("Match: ", match)
print("Before for statement")
for m in re.finditer(match, query):
print("In for statement")
print ("match start = ", m.start()+1)
print("After for statement")
data.close()
return
if __name__ == "__main__":
dataFile = "./test.txt"
test.txt looks like:
3
GACAACTTCCAACTTCCAACTTCCCGTCCCAACTTCACAACTTCGGCCCAACTTCCATGCAACTTCACCATCAACTTCGCTCGAAGCTGCCTTCCACTCCAACTTCACAACTTCCTCAACTTCCTCACCAACTTCAGCAACTTCTCTAGGGCCAACTTCCAACTTCTCAACTTCTCAACTTCCAACTTCCGACAACTTCTCCTGGCAACTTCCAACTTCCAACTTCAATACAACTTCGCAGACAACTTCCGCAACTTCGAACAACTTCCAACTTCCCCAACTTCCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCCCAACTTCAGATAGCAACTTCGATCTTACACAACTTCACGCAACTTCTCCAACTTCCAACTTCTGTGCAACTTCTCTGAACAACTTCCTCAACTTCCAACTTCGCAACTTCCCCAACTTCCTCAACTTCATGCAACTTCGAGGCAACTTCCCAACTTCGCAACTTCCTATTCCCAACTTCTGTGGCAACTTCTCAACTTCTGGACAACTTCTATGCCCAACTTCACAACTTCCCCAACTTCTTTACAACTTCGACAACTTCATCAACTTCTAGTCAACTTCTGGTCCAACTTCCAACTTCCCCAACTTCCAAAGTGCCGCAACTTCGTAACAACTTCACGCGCTCAACTTCAACCAACTTCTTTTCCCGCAACTTCGCAACTTCACAACTTCTAATCAACTTCCAACTTCGGATCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCTCCAGGGACAACTTCAAGTACAACTTCCAACTTCGCAACTTCACAACTTCCCAACTTCGCAACTTCTACACGCAACTTCCAACTTCTGGTCCCAACTTCATCAACTTCAGTCAACTTC
CAACTTCCA
GAAAGGATAAAGGATGTCAAAGGATGCAGATATAAAGGATAAAGGATACAAAGGATTAAGTATGTCCGAAAAAGGATAAAGGATTAAAAGGATCGCTGAACCTTACACAAAGGATAGTGAACAAAGGATTCAATAAAAGGATAAAGGATCAAAGGATCAAAAGGATACAAAGGATGCCGATGAAAGGATAAAGGATCTAAAGGATAAAGGATGAGAAAGGATTGGAAAGGATTATGAGATCAAAGGATAAAGGATGGTGTAAAAGCTAAAGGATTCGGCAAAGGATAAAGGATTAAAGGATAAAAAGGATAAAGGATGAAAGGATCCGCAGGGACCAGCAAAAGGATAAAGGATCGAATGGGTAAGAAAGGATCAAAGGATGAAAAAGGATTACTAAAGGATAAAGGATCCTGAAAGGATTACAAAGGATCTTAAAAGGATTCGGAAAGGATCCATAGGAAAAGGATAAAGGATCGCGAAAGGATAAAGGATTAAAGGATAAAGGATAAAAGGATTCAAAGGATAAAGGATAGACAAGGGAGAAAGGATGCAAAGGATGAAAGGATAAAGGATAAAGGATAAAGGATTAGGTTAAAGGATCGAAAGGATCCAAAGAGAGCGAAAAGGATGAGGACGAAAGGATCAAAGGATCCAAAGGATCAAAGGATAAAGGATTGCGATGGAAAGGATGTCAAAGGATCACCAAAGGATAAAGGATAATAAAGGATAAAGGATAAAGGATCAAAGGATTTGAAAAGGATAAAGGATCACGAAAGGATAAAAGGATTGGCAAAGGATGAAAAGGATGAAAGGATAAGCCCTCCAAAGGAT
AAAGGATAA
ATTCAAATACTTCAAATAGTCAAATATCAAATATGCTTCAAATATCAAATATCGGTCAAATAAGTCAAATAAGTCAAATATCAAATATCAAATACTCAAATATCAAATAGTGTCAAATACGAATTGGGTCAAATATCAAATATGTGTTCAAATATTCTTCAAATACTGGACACTCAAATAGGAGTCAAATATCAAATATCAAATACGTGAAGTGCTTGTCAAATATTTCAAATACTTTCAAATATTCAAATACTCAAATATGTTCAAATATCAAATATCAAATATTTCAAATATCAAATATCAAATAGTCCTCAAATAGCAAACCAGTCAAATATCAAATATGTCAAATATGCTCACGGCAACCTCAAATATCAAATATCAAATAGATGTCAAATAAGTTTCAAATATCAAATATCAAATATCAAATATCAAATATTCAAATATCAAATAGCGGGCTCAAATAAGCTCAAATACGTCAAATAGGGGGTCAAATAGATCAAATAGTCAAATATTCAAATACATCAAATAGTCAAATACAAGAACCACCGAGATCTCAAATAATCAAATATCAAATATGGTCAAATAAATATTCAAATAGGTCAAATAAGATCAAATATCAAATAAGTCGTCCATCAAATAGTCAAATAGCTCAAATAACCTCAAATATCAAATATTCAAATACCGGTCATCAAATAGGACAAATCAAATAGTCAAATAAGATCCTCTCAAATAATTCAAATAGCTGTTCAAATACTCAAATATCAAATAGTCAAATATCAAATATTCAAATATCAAATACTTCAAATATGTTCAAATAATCAAATACTCAAATATCAAATAATCAAATAATACTTCAAATATACCAAACGCTCAAATATTAGTTGGATCAAATATCTTCAAATATCAAATAA
TCAAATATC
The resulting output is:
Number of tests: 2
Query: GACAACTTCCAACTTCCAACTTCCCGTCCCAACTTCACAACTTCGGCCCAACTTCCATGCAACTTCACCATCAACTTCGCTCGAAGCTGCCTTCCACTCCAACTTCACAACTTCCTCAACTTCCTCACCAACTTCAGCAACTTCTCTAGGGCCAACTTCCAACTTCTCAACTTCTCAACTTCCAACTTCCGACAACTTCTCCTGGCAACTTCCAACTTCCAACTTCAATACAACTTCGCAGACAACTTCCGCAACTTCGAACAACTTCCAACTTCCCCAACTTCCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCCCAACTTCAGATAGCAACTTCGATCTTACACAACTTCACGCAACTTCTCCAACTTCCAACTTCTGTGCAACTTCTCTGAACAACTTCCTCAACTTCCAACTTCGCAACTTCCCCAACTTCCTCAACTTCATGCAACTTCGAGGCAACTTCCCAACTTCGCAACTTCCTATTCCCAACTTCTGTGGCAACTTCTCAACTTCTGGACAACTTCTATGCCCAACTTCACAACTTCCCCAACTTCTTTACAACTTCGACAACTTCATCAACTTCTAGTCAACTTCTGGTCCAACTTCCAACTTCCCCAACTTCCAAAGTGCCGCAACTTCGTAACAACTTCACGCGCTCAACTTCAACCAACTTCTTTTCCCGCAACTTCGCAACTTCACAACTTCTAATCAACTTCCAACTTCGGATCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCTCCAGGGACAACTTCAAGTACAACTTCCAACTTCGCAACTTCACAACTTCCCAACTTCGCAACTTCTACACGCAACTTCCAACTTCTGGTCCCAACTTCATCAACTTCAGTCAACTTC
Match: CAACTTCCA
Before for statement
After for statement
Query: GAAAGGATAAAGGATGTCAAAGGATGCAGATATAAAGGATAAAGGATACAAAGGATTAAGTATGTCCGAAAAAGGATAAAGGATTAAAAGGATCGCTGAACCTTACACAAAGGATAGTGAACAAAGGATTCAATAAAAGGATAAAGGATCAAAGGATCAAAAGGATACAAAGGATGCCGATGAAAGGATAAAGGATCTAAAGGATAAAGGATGAGAAAGGATTGGAAAGGATTATGAGATCAAAGGATAAAGGATGGTGTAAAAGCTAAAGGATTCGGCAAAGGATAAAGGATTAAAGGATAAAAAGGATAAAGGATGAAAGGATCCGCAGGGACCAGCAAAAGGATAAAGGATCGAATGGGTAAGAAAGGATCAAAGGATGAAAAAGGATTACTAAAGGATAAAGGATCCTGAAAGGATTACAAAGGATCTTAAAAGGATTCGGAAAGGATCCATAGGAAAAGGATAAAGGATCGCGAAAGGATAAAGGATTAAAGGATAAAGGATAAAAGGATTCAAAGGATAAAGGATAGACAAGGGAGAAAGGATGCAAAGGATGAAAGGATAAAGGATAAAGGATAAAGGATTAGGTTAAAGGATCGAAAGGATCCAAAGAGAGCGAAAAGGATGAGGACGAAAGGATCAAAGGATCCAAAGGATCAAAGGATAAAGGATTGCGATGGAAAGGATGTCAAAGGATCACCAAAGGATAAAGGATAATAAAGGATAAAGGATAAAGGATCAAAGGATTTGAAAAGGATAAAGGATCACGAAAGGATAAAAGGATTGGCAAAGGATGAAAAGGATGAAAGGATAAGCCCTCCAAAGGAT
Match: AAAGGATAA
Before for statement
In for statement
match start = 2
In for statement
match start = 34
In for statement
match start = 71
In for statement
match start = 136
In for statement
match start = 183
In for statement
match start = 199
In for statement
match start = 242
In for statement
match start = 280
In for statement
match start = 295
In for statement
match start = 304
In for statement
match start = 341
In for statement
match start = 396
In for statement
match start = 461
In for statement
match start = 479
In for statement
match start = 494
In for statement
match start = 518
In for statement
match start = 560
In for statement
match start = 574
In for statement
match start = 662
In for statement
match start = 705
In for statement
match start = 722
In for statement
match start = 755
In for statement
match start = 773
In for statement
match start = 809
After for statement
The first two set of match and query had no output. There are no error messages.
Upvotes: 0
Views: 320
Reputation: 61519
the for loop "for m in re.finditer(match, query):", only executes on the last set of data in my data file.
On the other lines, it executes - for zero iterations, because re.finditer(match, query)
finds zero matches. The reason is that both match
and query
have a newline at the end, except the last time through (since there is presumably no newline at the end of your file, and thus no newline at the end of the match
). This happens because the .readline()
method of the file includes the newline character at the end of the read-in line.
You can remove these with e.g. query = query.strip()
.
Upvotes: 1