Jam One
Jam One

Reputation: 55

for loop only executes on last iteration

In the attached code, the for loop for m in re.finditer(match, query): only executes on the last set of data in my data file.

I'm using Python 3.6.5.

I have placed print statements before, within and after the for loop, and the within also only executes on the last data set.

# Example: Finding a Motif in DNA
# The indexing in this problem starts from 1!
import re

def process_data(dataFile, outFile):
    with open(dataFile) as data:
        line = data.readline()
        numTests = int(line)
        print ("Number of tests: ", numTests)

        while True:
            line = data.readline()
            if not line:
                break

            query = line
            print ("Query: ", query)
            line = data.readline()
            match = line
            print ("Match: ", match)

            print("Before for statement")
            for m in re.finditer(match, query):
                print("In for statement")
                print ("match start = ", m.start()+1)

            print("After for statement")

    data.close()
    return


if __name__ == "__main__":
    dataFile = "./test.txt"

test.txt looks like:

3
GACAACTTCCAACTTCCAACTTCCCGTCCCAACTTCACAACTTCGGCCCAACTTCCATGCAACTTCACCATCAACTTCGCTCGAAGCTGCCTTCCACTCCAACTTCACAACTTCCTCAACTTCCTCACCAACTTCAGCAACTTCTCTAGGGCCAACTTCCAACTTCTCAACTTCTCAACTTCCAACTTCCGACAACTTCTCCTGGCAACTTCCAACTTCCAACTTCAATACAACTTCGCAGACAACTTCCGCAACTTCGAACAACTTCCAACTTCCCCAACTTCCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCCCAACTTCAGATAGCAACTTCGATCTTACACAACTTCACGCAACTTCTCCAACTTCCAACTTCTGTGCAACTTCTCTGAACAACTTCCTCAACTTCCAACTTCGCAACTTCCCCAACTTCCTCAACTTCATGCAACTTCGAGGCAACTTCCCAACTTCGCAACTTCCTATTCCCAACTTCTGTGGCAACTTCTCAACTTCTGGACAACTTCTATGCCCAACTTCACAACTTCCCCAACTTCTTTACAACTTCGACAACTTCATCAACTTCTAGTCAACTTCTGGTCCAACTTCCAACTTCCCCAACTTCCAAAGTGCCGCAACTTCGTAACAACTTCACGCGCTCAACTTCAACCAACTTCTTTTCCCGCAACTTCGCAACTTCACAACTTCTAATCAACTTCCAACTTCGGATCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCTCCAGGGACAACTTCAAGTACAACTTCCAACTTCGCAACTTCACAACTTCCCAACTTCGCAACTTCTACACGCAACTTCCAACTTCTGGTCCCAACTTCATCAACTTCAGTCAACTTC
CAACTTCCA
GAAAGGATAAAGGATGTCAAAGGATGCAGATATAAAGGATAAAGGATACAAAGGATTAAGTATGTCCGAAAAAGGATAAAGGATTAAAAGGATCGCTGAACCTTACACAAAGGATAGTGAACAAAGGATTCAATAAAAGGATAAAGGATCAAAGGATCAAAAGGATACAAAGGATGCCGATGAAAGGATAAAGGATCTAAAGGATAAAGGATGAGAAAGGATTGGAAAGGATTATGAGATCAAAGGATAAAGGATGGTGTAAAAGCTAAAGGATTCGGCAAAGGATAAAGGATTAAAGGATAAAAAGGATAAAGGATGAAAGGATCCGCAGGGACCAGCAAAAGGATAAAGGATCGAATGGGTAAGAAAGGATCAAAGGATGAAAAAGGATTACTAAAGGATAAAGGATCCTGAAAGGATTACAAAGGATCTTAAAAGGATTCGGAAAGGATCCATAGGAAAAGGATAAAGGATCGCGAAAGGATAAAGGATTAAAGGATAAAGGATAAAAGGATTCAAAGGATAAAGGATAGACAAGGGAGAAAGGATGCAAAGGATGAAAGGATAAAGGATAAAGGATAAAGGATTAGGTTAAAGGATCGAAAGGATCCAAAGAGAGCGAAAAGGATGAGGACGAAAGGATCAAAGGATCCAAAGGATCAAAGGATAAAGGATTGCGATGGAAAGGATGTCAAAGGATCACCAAAGGATAAAGGATAATAAAGGATAAAGGATAAAGGATCAAAGGATTTGAAAAGGATAAAGGATCACGAAAGGATAAAAGGATTGGCAAAGGATGAAAAGGATGAAAGGATAAGCCCTCCAAAGGAT
AAAGGATAA
ATTCAAATACTTCAAATAGTCAAATATCAAATATGCTTCAAATATCAAATATCGGTCAAATAAGTCAAATAAGTCAAATATCAAATATCAAATACTCAAATATCAAATAGTGTCAAATACGAATTGGGTCAAATATCAAATATGTGTTCAAATATTCTTCAAATACTGGACACTCAAATAGGAGTCAAATATCAAATATCAAATACGTGAAGTGCTTGTCAAATATTTCAAATACTTTCAAATATTCAAATACTCAAATATGTTCAAATATCAAATATCAAATATTTCAAATATCAAATATCAAATAGTCCTCAAATAGCAAACCAGTCAAATATCAAATATGTCAAATATGCTCACGGCAACCTCAAATATCAAATATCAAATAGATGTCAAATAAGTTTCAAATATCAAATATCAAATATCAAATATCAAATATTCAAATATCAAATAGCGGGCTCAAATAAGCTCAAATACGTCAAATAGGGGGTCAAATAGATCAAATAGTCAAATATTCAAATACATCAAATAGTCAAATACAAGAACCACCGAGATCTCAAATAATCAAATATCAAATATGGTCAAATAAATATTCAAATAGGTCAAATAAGATCAAATATCAAATAAGTCGTCCATCAAATAGTCAAATAGCTCAAATAACCTCAAATATCAAATATTCAAATACCGGTCATCAAATAGGACAAATCAAATAGTCAAATAAGATCCTCTCAAATAATTCAAATAGCTGTTCAAATACTCAAATATCAAATAGTCAAATATCAAATATTCAAATATCAAATACTTCAAATATGTTCAAATAATCAAATACTCAAATATCAAATAATCAAATAATACTTCAAATATACCAAACGCTCAAATATTAGTTGGATCAAATATCTTCAAATATCAAATAA
TCAAATATC

The resulting output is:

Number of tests:  2
Query:  GACAACTTCCAACTTCCAACTTCCCGTCCCAACTTCACAACTTCGGCCCAACTTCCATGCAACTTCACCATCAACTTCGCTCGAAGCTGCCTTCCACTCCAACTTCACAACTTCCTCAACTTCCTCACCAACTTCAGCAACTTCTCTAGGGCCAACTTCCAACTTCTCAACTTCTCAACTTCCAACTTCCGACAACTTCTCCTGGCAACTTCCAACTTCCAACTTCAATACAACTTCGCAGACAACTTCCGCAACTTCGAACAACTTCCAACTTCCCCAACTTCCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCCCAACTTCAGATAGCAACTTCGATCTTACACAACTTCACGCAACTTCTCCAACTTCCAACTTCTGTGCAACTTCTCTGAACAACTTCCTCAACTTCCAACTTCGCAACTTCCCCAACTTCCTCAACTTCATGCAACTTCGAGGCAACTTCCCAACTTCGCAACTTCCTATTCCCAACTTCTGTGGCAACTTCTCAACTTCTGGACAACTTCTATGCCCAACTTCACAACTTCCCCAACTTCTTTACAACTTCGACAACTTCATCAACTTCTAGTCAACTTCTGGTCCAACTTCCAACTTCCCCAACTTCCAAAGTGCCGCAACTTCGTAACAACTTCACGCGCTCAACTTCAACCAACTTCTTTTCCCGCAACTTCGCAACTTCACAACTTCTAATCAACTTCCAACTTCGGATCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCTCCAGGGACAACTTCAAGTACAACTTCCAACTTCGCAACTTCACAACTTCCCAACTTCGCAACTTCTACACGCAACTTCCAACTTCTGGTCCCAACTTCATCAACTTCAGTCAACTTC

Match:  CAACTTCCA

Before for statement
After for statement
Query:  GAAAGGATAAAGGATGTCAAAGGATGCAGATATAAAGGATAAAGGATACAAAGGATTAAGTATGTCCGAAAAAGGATAAAGGATTAAAAGGATCGCTGAACCTTACACAAAGGATAGTGAACAAAGGATTCAATAAAAGGATAAAGGATCAAAGGATCAAAAGGATACAAAGGATGCCGATGAAAGGATAAAGGATCTAAAGGATAAAGGATGAGAAAGGATTGGAAAGGATTATGAGATCAAAGGATAAAGGATGGTGTAAAAGCTAAAGGATTCGGCAAAGGATAAAGGATTAAAGGATAAAAAGGATAAAGGATGAAAGGATCCGCAGGGACCAGCAAAAGGATAAAGGATCGAATGGGTAAGAAAGGATCAAAGGATGAAAAAGGATTACTAAAGGATAAAGGATCCTGAAAGGATTACAAAGGATCTTAAAAGGATTCGGAAAGGATCCATAGGAAAAGGATAAAGGATCGCGAAAGGATAAAGGATTAAAGGATAAAGGATAAAAGGATTCAAAGGATAAAGGATAGACAAGGGAGAAAGGATGCAAAGGATGAAAGGATAAAGGATAAAGGATAAAGGATTAGGTTAAAGGATCGAAAGGATCCAAAGAGAGCGAAAAGGATGAGGACGAAAGGATCAAAGGATCCAAAGGATCAAAGGATAAAGGATTGCGATGGAAAGGATGTCAAAGGATCACCAAAGGATAAAGGATAATAAAGGATAAAGGATAAAGGATCAAAGGATTTGAAAAGGATAAAGGATCACGAAAGGATAAAAGGATTGGCAAAGGATGAAAAGGATGAAAGGATAAGCCCTCCAAAGGAT

Match:  AAAGGATAA
Before for statement
In for statement
match start =  2
In for statement
match start =  34
In for statement
match start =  71
In for statement
match start =  136
In for statement
match start =  183
In for statement
match start =  199
In for statement
match start =  242
In for statement
match start =  280
In for statement
match start =  295
In for statement
match start =  304
In for statement
match start =  341
In for statement
match start =  396
In for statement
match start =  461
In for statement
match start =  479
In for statement
match start =  494
In for statement
match start =  518
In for statement
match start =  560
In for statement
match start =  574
In for statement
match start =  662
In for statement
match start =  705
In for statement
match start =  722
In for statement
match start =  755
In for statement
match start =  773
In for statement
match start =  809
After for statement

The first two set of match and query had no output. There are no error messages.

Upvotes: 0

Views: 320

Answers (1)

Karl Knechtel
Karl Knechtel

Reputation: 61519

the for loop "for m in re.finditer(match, query):", only executes on the last set of data in my data file.

On the other lines, it executes - for zero iterations, because re.finditer(match, query) finds zero matches. The reason is that both match and query have a newline at the end, except the last time through (since there is presumably no newline at the end of your file, and thus no newline at the end of the match). This happens because the .readline() method of the file includes the newline character at the end of the read-in line.

You can remove these with e.g. query = query.strip().

Upvotes: 1

Related Questions