Reputation:
I´m reading a file in Python where each record is separated by an empty new line. If the file ends in two or more new lines, the last record is processed as expected, but if the file ends in a single new line it´s not processed. Here´s the code:
def fread():
record = False
for line in open('somefile.txt'):
if line.startswith('Record'):
record = True
d = SomeObject()
# do some processing with line
d.process(line)
if not line.strip() and record:
yield d
record = False
for record in fread():
print(record)
In this data sample, everything works as expected ('---' is an empty line):
Record 1
data a
data b
data c
\n
Record 2
data a
data b
data c
\n
\n
But in this, the last record isn´t returned:
Record 1
data a
data b
data c
\n
Record 2
data a
data b
data c
\n
How can I preserve the last new line from the file to get the last record?
PS.: I´m using the term "preserve" as I couldn´t find a better name.
Thanks.
Edit The original code was a stripped version, just to illustrate the problem, but it seems that I stripped too much. Now I posted all function´s code.
A little more explanation: The object SomeObject
is created for each record in the file and the records are separated by empty new lines. At the end of the record it yields back the object so I can use it (save to a db, compare to another objects, etc).
The main problem when the file ends in a single new line, the last record isn´t yielded. It seems that Python does not read the last line when it´s blank.
Upvotes: 2
Views: 722
Reputation: 97902
You might find a slight twist in a more classically pythonic direction improves the predicability of the code:
def fread():
for line in open('text.txt'):
if line.strip():
d = SomeObject()
yield d
raise StopIteration
for record in fread():
print record
The preferred way to end a generator in Python, though often not strictly necessary, is with the StopIteration exception. Using if line.strip()
simply means that you'll do the yield if there's anything remaining in line after stripping whitespace. The construction of SomeObject() can be anywhere... I just happened to move it in case construction of SomeObject was expensive, or had side-effects that shouldn't happen if the line is empty.
EDIT: I'll leave my answer here for posterity's sake, but DNS below got the original intent right, where several lines contribute to the same SomeObject() record (which I totally glossed over).
Upvotes: 5
Reputation: 8966
replace open('somefile.txt'):
with open('somefile.txt').read().split('\n'):
and your code will work.
But Jarret Hardie's answer is better.
Upvotes: 0
Reputation: 38189
The way it's written now probably doesn't work anyway; with d = SomeObject()
inside your loop, a new SomeObject is being created for every line. Yet, if I understand correctly, what you want is for all of the lines in between empty lines to contribute to that one object. You could do something like this instead:
def fread():
d = None
for line in open('somefile.txt'):
if d is None:
d = SomeObject()
if line.strip():
# do some processing
else:
yield d
d = None
if d: yield d
This isn't great code, but it does work; that last object that misses its empty line is yielded when the loop is done.
Upvotes: 6
Reputation: 36174
If you call readline
repeatedly (in a loop) on your file object (instead of using in
) it should work as you expect. Compare these:
>>> x = open('/tmp/xyz')
>>> x.readline()
'x\n'
>>> x.readline()
'\n'
>>> x.readline()
'y\n'
>>> x.readline()
''
>>> open('/tmp/xyz').readlines()
['x\n', '\n', 'y\n']
Upvotes: 0
Reputation: 29877
line.strip() will result in an empty string on an empty line. An empty string is False, so you swallow the empty line
>>> bool("\n".strip())
False
>>> bool("\n")
True
Upvotes: 0