Neil C. Obremski
Neil C. Obremski

Reputation: 20274

How to write UTF-8 CSV into BytesIO in Python3?

Firstly, I understand how to write UTF-8 from strings in Python3 and that StringIO is recommended for such string building. However, I specifically need a binary file-like object and for that I need BytesIO. If I do the following then the data ends up blowing up because it gets read as Latin1, my computer's default locale/charset.

with io.StringIO() as sb:
    csv.writer(sb).writerows(rows)
    sb.flush()
    sb.seek(0)
    # blows up with Latin1 encoding error
    job = bq.load_table_from_file(sb, table_ref, job_config=job_config)

So my work-around is this monstrosity that doubles the amount of memory used:

with io.StringIO() as sb:
    csv.writer(sb).writerows(rows)
    sb.flush()
    sb.seek(0)
    with io.BytesIO(sb.getvalue().encode('utf-8')) as buffer:
        job = bq.load_table_from_file(buffer, table_ref, job_config=job_config)

Somewhere in this chain there must be a way to specify the byte-encoding so that readers of the file-like sb will see the data as UTF-8. Or is there a way to use csv.writer() with a byte stream?

I've looked for both of these answers on StackOverflow but what I've found has generally been for writing to files and for stuff in memory everything points to StringIO.

Upvotes: 6

Views: 8108

Answers (1)

Neil C. Obremski
Neil C. Obremski

Reputation: 20274

There is a TextIOWrapper class which does the job but if you use a context manager with it then it will close the stream and make the original BytesIO object unusable.

Modifying my original example:

with io.BytesIO() as buffer:
    sb = io.TextIOWrapper(buffer, 'utf-8', newline='')
    csv.writer(sb).writerows(rows)
    sb.flush()
    buffer.seek(0)
    job = bq.load_table_from_file(buffer, table_ref, job_config=job_config)

Another caveat is the newline parameter which, if left alone, does translations of new-line characters. Set newline = '' to prevent this.

Upvotes: 6

Related Questions