Reputation:
I need to iterate on a file, stop iteration on a condition and then continue parse the file at the same line with another function (That may change so I can't just add content in the previous function).
An example file (file.txt) :
1
2
3
4
5
6
7
8
9
Function I try to do :
def parse1(file, stop):
# 1st parsing function (Main function I am doing)
for line in file:
if line.strip() == stop:
# Stop parsing on condition
break
else:
# Parse the line (just print for example)
print(line)
def parse2(file):
# 2nd parsing function (Will be my own functions or external functions)
for line in file:
# Parse the line (just print for example)
print(line)
Result in terminal:
>>> file = open("file.txt")
>>> parse1(file, "4")
1
2
3
>>> parse2(file)
5
6
7
8
9
My problem with this is the "4" line is skipped by the 1st function when I look for condition.
How can I avoid this : I found any solution for cancel the last iteration or go back a line.
The file.tell()
function don't work with for
on file.
I tried to do this with while
+ file.readline()
but it is very very slower than the for
loop on file (And I want to parse files with millions of lines).
Is there an elegant solution for keeping the use of the for
loop ?
Upvotes: 3
Views: 274
Reputation: 85
In python3, the 'for line in file' construct is represented by an iterator internally. By definition, a value that was produced from an iterator cannot be 'put back' for later use (http://www.diveintopython3.net/iterators.html).
To get the desired behaviour, you need a function that chains together two iterators, such as the chain
function provided by the itertools
module. In the stop condition of parse1
, you return the last line together with the file iterator:
import itertools
def parse1(file,stop):
# 1st parsing function
for line in file:
# Stop parsing on condition
if line.strip() == stop:
return itertools.chain([line],file) # important line
else:
# Parse the line (just print for example)
print('parse1: '+line)
The chain statement connects two iterators. The first iterator contains just one element: the line you want to process again. The second iterator is the remaining part of the file. As soon as the first iterator runs out of values, the second iterator is accessed.
You don't need to change parse2
. For clarity, I modified the print statement:
def parse2(file):
# 2nd parsing function
for line in file:
# Parse the line (just print for example)
print('parse2: '+line)
Then, you can call parse1 and parse2 in a most functional manner:
with open('testfile','r') as infile:
parse2(parse1(infile,'4'))
The output of the above line is:
parse1: 1
parse1: 2
parse1: 3
parse2: 4
parse2: 5
parse2: 6
parse2: 7
parse2: 8
parse2: 9
Note, how the value '4' was produced by the parse2
function.
Upvotes: 2
Reputation: 107287
I suggest to make a copy1 of your file object and just iterate over the copy in else
block and call the second function within first function , also as a more pythonic way you can use with
statement for opening the file that will close the file at end of the statement and put the second function within first function :
#ex.txt
1
2
3
4
5
6
7
8
9
10
you can use itertools.tee
for create copy1 of your file object :
from itertools import tee
def parse1(file_name, stop):
def parse2(file_obj):
print '**********'
for line in file_obj:
print(line)
with open(file_name) as file_obj:
temp,file_obj=tee(file_obj)
for line in temp:
if line.strip() == stop:
break
else:
next(file_obj)
print(line)
parse2(file_obj)
parse1("ex.txt",'4')
result :
1
2
3
**********
4
5
6
7
8
9
10
1) actually itertools.tee
doesn't create a copy but you can use it for this aim based on DOC it Return n independent iterators from a single iterable.
and you can assign one of this independent iterators to the object itself that has been iterated and create one another as temp.
Upvotes: 1
Reputation: 148965
IMHO, the simplest solution is to have first parser return the line where it found the stop condition, and pass it to second one. The second should have an explicit function to parse one line to avoid code duplication :
def parse1(file, stop):
# 1st parsing function (Main function I am doing)
for line in file:
if line.strip() == stop:
# Stop parsing on condition
return line
else:
# Parse the line (just print for example)
print(line)
return None
def parse2(file, line = None):
# 2nd parsing function (Will be my own functions or external functions)
def doParse(line):
# do actual parsing (just print for example)
print(line)
if line is None:
doParse(line)
for line in file:
doParse(line)
# main
...
stop = parse1(file)
if stop:
parse2(stop, file)
Upvotes: 0