user5002536
user5002536

Reputation:

Cancel last line iteration on a file

I need to iterate on a file, stop iteration on a condition and then continue parse the file at the same line with another function (That may change so I can't just add content in the previous function).

An example file (file.txt) :

1
2
3
4
5
6
7
8
9

Function I try to do :

def parse1(file, stop):
# 1st parsing function (Main function I am doing)
    for line in file:
            if line.strip() == stop:
            # Stop parsing on condition
                break
            else:
            # Parse the line (just print for example)
                print(line)

def parse2(file):
# 2nd parsing function (Will be my own functions or external functions)
    for line in file:
        # Parse the line (just print for example)
        print(line)

Result in terminal:

>>> file = open("file.txt")

>>> parse1(file, "4")
1
2
3

>>> parse2(file)
5
6
7
8
9

My problem with this is the "4" line is skipped by the 1st function when I look for condition.

How can I avoid this : I found any solution for cancel the last iteration or go back a line.

The file.tell() function don't work with for on file.

I tried to do this with while + file.readline() but it is very very slower than the for loop on file (And I want to parse files with millions of lines).

Is there an elegant solution for keeping the use of the for loop ?

Upvotes: 3

Views: 274

Answers (3)

Ser Jothan Chanes
Ser Jothan Chanes

Reputation: 85

In python3, the 'for line in file' construct is represented by an iterator internally. By definition, a value that was produced from an iterator cannot be 'put back' for later use (http://www.diveintopython3.net/iterators.html).

To get the desired behaviour, you need a function that chains together two iterators, such as the chain function provided by the itertools module. In the stop condition of parse1, you return the last line together with the file iterator:

import itertools

def parse1(file,stop):
# 1st parsing function
    for line in file:
       # Stop parsing on condition
        if line.strip() == stop:
            return itertools.chain([line],file) # important line
        else:
        # Parse the line (just print for example)
            print('parse1: '+line)

The chain statement connects two iterators. The first iterator contains just one element: the line you want to process again. The second iterator is the remaining part of the file. As soon as the first iterator runs out of values, the second iterator is accessed.

You don't need to change parse2. For clarity, I modified the print statement:

def parse2(file):
# 2nd parsing function
for line in file:
    # Parse the line (just print for example)
    print('parse2: '+line)

Then, you can call parse1 and parse2 in a most functional manner:

with open('testfile','r') as infile:
   parse2(parse1(infile,'4'))

The output of the above line is:

parse1: 1
parse1: 2
parse1: 3
parse2: 4
parse2: 5
parse2: 6
parse2: 7
parse2: 8
parse2: 9

Note, how the value '4' was produced by the parse2 function.

Upvotes: 2

Kasravnd
Kasravnd

Reputation: 107287

I suggest to make a copy1 of your file object and just iterate over the copy in else block and call the second function within first function , also as a more pythonic way you can use with statement for opening the file that will close the file at end of the statement and put the second function within first function :

#ex.txt

1
2
3
4
5
6
7
8
9
10

you can use itertools.tee for create copy1 of your file object :

from itertools import tee

def parse1(file_name, stop):

  def parse2(file_obj):
    print '**********'
    for line in file_obj:
        print(line)

  with open(file_name) as file_obj:
    temp,file_obj=tee(file_obj)
    for line in temp:
            if line.strip() == stop:
                break
            else:
                next(file_obj)
                print(line)
    parse2(file_obj)

parse1("ex.txt",'4')

result :

1

2

3

**********
4

5

6

7

8

9

10

1) actually itertools.tee doesn't create a copy but you can use it for this aim based on DOC it Return n independent iterators from a single iterable. and you can assign one of this independent iterators to the object itself that has been iterated and create one another as temp.

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148965

IMHO, the simplest solution is to have first parser return the line where it found the stop condition, and pass it to second one. The second should have an explicit function to parse one line to avoid code duplication :

def parse1(file, stop):
# 1st parsing function (Main function I am doing)
    for line in file:
            if line.strip() == stop:
            # Stop parsing on condition
                return line
            else:
            # Parse the line (just print for example)
                print(line)
    return None

def parse2(file, line = None):
# 2nd parsing function (Will be my own functions or external functions)
    def doParse(line):
    # do actual parsing (just print for example)
        print(line)
    if line is None:
        doParse(line)
    for line in file:
        doParse(line)

# main
...
stop = parse1(file)
if stop:
    parse2(stop, file)

Upvotes: 0

Related Questions