Reputation: 143
I have data in the database and I want to export it to CSV file. The data is in Japanese and for further use, I need it to be encoded in UTF-8
This is my script to get the data and write in the CSV file
import mysql.connector
from mysql.connector import errorcode
import sys
import csv
query = 'SELECT * FROM `images-data`'
try:
cnx = mysql.connector.connect(user='root', password='1234',
host='127.0.0.1',
database='sotsuken-test-db')
cur=cnx.cursor()
cur.execute(query)
result=cur.fetchall()
c = csv.writer(open("db-data.csv","w"))
for row in result:
c.writerow(row)
except mysql.connector.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your user name or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
else:
cnx.close()
I can create the CSV file, and the data is exported in UTF-8, but my CSV file's data was like this:
1,b'\xe3\x83\x86\xe3\x82\xb9\xe3\x83\x88\xe3\x81\xa7\xe3\x81\x94\xe3\x81\x8a\xe3\x81\x96\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99'
In my research I found that my data was written in byte object (may be I'm wrong here). I need the data to be written in UTF-8 without b''
I know that I can use decode("UTF-8")
to make it into the string, but I cannot apply that to csv.writerow
Can you give me any suggestions?
Upvotes: 3
Views: 2420
Reputation: 365737
csv.writer.writerow
just takes a list of whatever you've got,1 calls str
on each element, and puts them together into a line of CSV.
So, if what you've got is bytes
objects, then it's going to call str
on them and get strings like b'\xe3\x83\x86\xe3\x82\xb9\xe3\x83\x88\xe3\x81\xa7\xe3\x81\x94\xe3\x81\x8a\xe3\x81\x96\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99'
.
The solution is to give it strings.
The simplest way to do it is by decoding those bytes
:
I know that I can use
decode("UTF-8")
to make it into the string, but I cannot apply that tocsv.writerow
Sure you can. I don't know what your columns are, so I'll make up my own example:
for image_id, image_name in result:
c.writerow([image_id, image_name.decode('UTF-8')])
That's really all there is to it.
What if your code has absolutely no idea what the columns in the database are? Well, in that case, you have to get a little bit clever. As long as we know that all of the bytes
columns are really Unicode text in disguise as UTF-8, you can just switch on the type:
for row in result:
textrow = [col.decode('UTF-8') if isinstance(col, bytes) else col for col in row]
csv.writerow(textrow)
That's a bit ugly, but reading *
from a database where you have no idea what the columns are is inherently an ugly problem.
However, there's probably a much better solution.
MySQL Connector/Python turns all CHAR
, TEXT
, and similar columns into Unicode str
values by default.2 But BINARY
, BLOB
, and similar columns are always returned as bytes
.
If this column is meant to represent Unicode text, set it to a text type instead of a binary one in the database. Then you won't have a problem in the first place—in this script, or in any other tools.
1. Actually, it appears to be illegal to call it with a list of anything but strings and numbers, according to the docs. But practically, it takes anything at all; it just doesn't do anything very useful with any types besides strings and numbers…
2. That is, unless you explicitly pass it a use_unicode=False
in the connect
call or similarly set it elsewhere.
Upvotes: 2