Al Smith
Al Smith

Reputation: 1

StringIO appears to behave different when initialising from a buffer as opposed to writing data into it line by line

I'm trying to read some data and parse it out as CSV. The data format in question comes with a wacky first line that I first need to get rid of.

delimiter = None
with open('data.csv', 'r', encoding='latin1') as fd:
    input1 = io.StringIO(fd.read())

with open('data.csv', 'r', encoding='latin1') as fd:
    input2 = io.StringIO()
    for line in fd:
        if line.startswith('sep='):
            delimiter = line[4]
        else:
            input2.write(line)

with open('data.csv', 'r', encoding='latin1') as fd:
    buf = ''
    for line in fd:
        if line.startswith('sep='):
            delimiter = line[4]
        else:
            buf += line
    input3 = io.StringIO(buf)

In the case that I do actually add in that first line, then input1.getvalue() == input2.getvalue() == input3.getvalue(). And if I don't then at least input2.getvalue() == input3.getvalue().

Then comes the CSV bit:

inputReader = csv.DictReader(inputX, delimiter=delimiter or ';')
for row in inputReader:
    print(row)

This works for input1, but due to the wacky first line it messes up the column names, as expected.

It works for input3, with correct column names. I'm curious though as to why the for loop doesn't return any results for input2. What's the difference between input2 and input3 at that point?

Upvotes: 0

Views: 304

Answers (1)

user2357112
user2357112

Reputation: 281551

input2 is positioned at the end of the "file", whereas constructing a StringIO from a string directly places the file position at the start.

To fix the input2 code, seek back to the start once you're done writing:

input2.seek(0)

Upvotes: 1

Related Questions