John Zwinck
John Zwinck

Reputation: 249293

csv.writer prints "bytes" with prefix and quotes

In Python 2 this code does what I'd expect:

import csv
import sys

writer = csv.writer(sys.stdout)
writer.writerow([u'hello', b'world'])

It prints:

hello,world

But in Python 3, bytes are printed with a prefix and quotes:

hello,b'world'

Since CSV is a generic data interchange format, and since no system other than Python knows what b'' is, I need to disable this behavior. But I haven't figured out how.

Of course I could use str.decode on all the bytes first, but that is inconvenient and inefficient. What I really want is either to write the literal bytes to the file, or pass an encoding (e.g. 'ascii') to csv.writer() so it knows how to decode any bytes objects it sees.

Upvotes: 3

Views: 2374

Answers (2)

martineau
martineau

Reputation: 123483

I don't think there's any way of avoiding having to explictly convert the byte strings into unicode strings with the csv module in Python 3. In Python 2, they're implicitly converted to ASCII.

To make this easier you could effectively subclass csv.writer or wrap objects as shown below, which will make the process more convenient.

import csv

class CSV_Writer(object):
    def __init__(self, *args, **kwrds):
        self.csv_writer = csv.writer(*args, **kwrds)

    def __getattr__(self, name):
        return getattr(self.csv_writer, name)

    def writerow(self, row):
        self.csv_writer.writerow(str(v, encoding='utf-8') if isinstance(v, bytes) 
                                        else v for v in row)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)


with open('bytes_test.csv', 'w', newline='') as file:
    writer = CSV_Writer(file)
    writer.writerow([u'hello', b'world'])

Upvotes: 1

Mark Tolonen
Mark Tolonen

Reputation: 177755

csv writes text files and expects Unicode (text) strings in Python 3.

csv writes binary files and expects byte strings in Python 2, but allowed implicit encoding of Unicode strings to byte strings using the default ascii codec. Python 3 does not allow implicit conversion, so you can't really avoid it:

#!python3
import csv
import sys
writer = csv.writer(sys.stdout)
writer.writerow(['hello', b'world'.decode()])

Upvotes: 1

Related Questions