Reputation: 249293
In Python 2 this code does what I'd expect:
import csv
import sys
writer = csv.writer(sys.stdout)
writer.writerow([u'hello', b'world'])
It prints:
hello,world
But in Python 3, bytes
are printed with a prefix and quotes:
hello,b'world'
Since CSV is a generic data interchange format, and since no system other than Python knows what b''
is, I need to disable this behavior. But I haven't figured out how.
Of course I could use str.decode
on all the bytes
first, but that is inconvenient and inefficient. What I really want is either to write the literal bytes to the file, or pass an encoding (e.g. 'ascii') to csv.writer()
so it knows how to decode any bytes
objects it sees.
Upvotes: 3
Views: 2374
Reputation: 123483
I don't think there's any way of avoiding having to explictly convert the byte strings into unicode strings with the csv
module in Python 3. In Python 2, they're implicitly converted to ASCII.
To make this easier you could effectively subclass csv.writer
or wrap objects as shown below, which will make the process more convenient.
import csv
class CSV_Writer(object):
def __init__(self, *args, **kwrds):
self.csv_writer = csv.writer(*args, **kwrds)
def __getattr__(self, name):
return getattr(self.csv_writer, name)
def writerow(self, row):
self.csv_writer.writerow(str(v, encoding='utf-8') if isinstance(v, bytes)
else v for v in row)
def writerows(self, rows):
for row in rows:
self.writerow(row)
with open('bytes_test.csv', 'w', newline='') as file:
writer = CSV_Writer(file)
writer.writerow([u'hello', b'world'])
Upvotes: 1
Reputation: 177755
csv
writes text files and expects Unicode (text) strings in Python 3.
csv
writes binary files and expects byte strings in Python 2, but allowed implicit encoding of Unicode strings to byte strings using the default ascii
codec. Python 3 does not allow implicit conversion, so you can't really avoid it:
#!python3
import csv
import sys
writer = csv.writer(sys.stdout)
writer.writerow(['hello', b'world'.decode()])
Upvotes: 1