user32117
user32117

Reputation:

Preserving last new line when reading a file

I´m reading a file in Python where each record is separated by an empty new line. If the file ends in two or more new lines, the last record is processed as expected, but if the file ends in a single new line it´s not processed. Here´s the code:

def fread():
    record = False
    for line in open('somefile.txt'):
        if line.startswith('Record'):
            record = True
            d = SomeObject()

        # do some processing with line
        d.process(line)

        if not line.strip() and record:
            yield d
            record = False

for record in fread():
    print(record)

In this data sample, everything works as expected ('---' is an empty line):

Record 1
data a
data b
data c
\n
Record 2
data a
data b
data c
\n
\n

But in this, the last record isn´t returned:

Record 1
data a
data b
data c
\n
Record 2
data a
data b
data c
\n

How can I preserve the last new line from the file to get the last record?

PS.: I´m using the term "preserve" as I couldn´t find a better name.

Thanks.

Edit The original code was a stripped version, just to illustrate the problem, but it seems that I stripped too much. Now I posted all function´s code.

A little more explanation: The object SomeObject is created for each record in the file and the records are separated by empty new lines. At the end of the record it yields back the object so I can use it (save to a db, compare to another objects, etc).

The main problem when the file ends in a single new line, the last record isn´t yielded. It seems that Python does not read the last line when it´s blank.

Upvotes: 2

Views: 722

Answers (5)

Jarret Hardie
Jarret Hardie

Reputation: 97902

You might find a slight twist in a more classically pythonic direction improves the predicability of the code:

def fread():
    for line in open('text.txt'):
        if line.strip():
            d = SomeObject()
            yield d

    raise StopIteration

for record in fread():
    print record

The preferred way to end a generator in Python, though often not strictly necessary, is with the StopIteration exception. Using if line.strip() simply means that you'll do the yield if there's anything remaining in line after stripping whitespace. The construction of SomeObject() can be anywhere... I just happened to move it in case construction of SomeObject was expensive, or had side-effects that shouldn't happen if the line is empty.

EDIT: I'll leave my answer here for posterity's sake, but DNS below got the original intent right, where several lines contribute to the same SomeObject() record (which I totally glossed over).

Upvotes: 5

tgray
tgray

Reputation: 8966

replace open('somefile.txt'): with open('somefile.txt').read().split('\n'): and your code will work.

But Jarret Hardie's answer is better.

Upvotes: 0

DNS
DNS

Reputation: 38189

The way it's written now probably doesn't work anyway; with d = SomeObject() inside your loop, a new SomeObject is being created for every line. Yet, if I understand correctly, what you want is for all of the lines in between empty lines to contribute to that one object. You could do something like this instead:

def fread():
    d = None
    for line in open('somefile.txt'):

        if d is None:
            d = SomeObject()

        if line.strip():
            # do some processing
        else:
            yield d
            d = None

    if d: yield d

This isn't great code, but it does work; that last object that misses its empty line is yielded when the loop is done.

Upvotes: 6

Jacob Gabrielson
Jacob Gabrielson

Reputation: 36174

If you call readline repeatedly (in a loop) on your file object (instead of using in) it should work as you expect. Compare these:

>>> x = open('/tmp/xyz')
>>> x.readline()
'x\n'
>>> x.readline()
'\n'
>>> x.readline()
'y\n'
>>> x.readline()
''
>>> open('/tmp/xyz').readlines()
['x\n', '\n', 'y\n']

Upvotes: 0

f3lix
f3lix

Reputation: 29877

line.strip() will result in an empty string on an empty line. An empty string is False, so you swallow the empty line

>>> bool("\n".strip())
False
>>> bool("\n")
True

Upvotes: 0

Related Questions