(Python) Write string to CSV with UTF-8 Encoding

Question

I have data in the database and I want to export it to CSV file. The data is in Japanese and for further use, I need it to be encoded in UTF-8

This is my script to get the data and write in the CSV file

import mysql.connector
from mysql.connector import errorcode

import sys
import csv

query = 'SELECT * FROM `images-data`'

try:
    cnx = mysql.connector.connect(user='root', password='1234',
                                 host='127.0.0.1',
                                 database='sotsuken-test-db')
    cur=cnx.cursor()
    cur.execute(query)
    result=cur.fetchall()

    c = csv.writer(open("db-data.csv","w"))
    for row in result:
        c.writerow(row)



except mysql.connector.Error as err:
    if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
        print("Something is wrong with your user name or password")
    elif err.errno == errorcode.ER_BAD_DB_ERROR:
        print("Database does not exist")
    else:
        print(err)
else:
    cnx.close()

I can create the CSV file, and the data is exported in UTF-8, but my CSV file's data was like this:

1,b'\xe3\x83\x86\xe3\x82\xb9\xe3\x83\x88\xe3\x81\xa7\xe3\x81\x94\xe3\x81\x8a\xe3\x81\x96\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99'

In my research I found that my data was written in byte object (may be I'm wrong here). I need the data to be written in UTF-8 without b''

I know that I can use decode("UTF-8") to make it into the string, but I cannot apply that to csv.writerow

Can you give me any suggestions?

abarnert · Accepted Answer

csv.writer.writerow just takes a list of whatever you've got,¹ calls str on each element, and puts them together into a line of CSV.

So, if what you've got is bytes objects, then it's going to call str on them and get strings like b'\xe3\x83\x86\xe3\x82\xb9\xe3\x83\x88\xe3\x81\xa7\xe3\x81\x94\xe3\x81\x8a\xe3\x81\x96\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99'.

The solution is to give it strings.

The simplest way to do it is by decoding those bytes:

I know that I can use decode("UTF-8") to make it into the string, but I cannot apply that to csv.writerow

Sure you can. I don't know what your columns are, so I'll make up my own example:

for image_id, image_name in result:
    c.writerow([image_id, image_name.decode('UTF-8')])

That's really all there is to it.

What if your code has absolutely no idea what the columns in the database are? Well, in that case, you have to get a little bit clever. As long as we know that all of the bytes columns are really Unicode text in disguise as UTF-8, you can just switch on the type:

for row in result:
    textrow = [col.decode('UTF-8') if isinstance(col, bytes) else col for col in row]
    csv.writerow(textrow)

That's a bit ugly, but reading * from a database where you have no idea what the columns are is inherently an ugly problem.

However, there's probably a much better solution.

MySQL Connector/Python turns all CHAR, TEXT, and similar columns into Unicode str values by default.² But BINARY, BLOB, and similar columns are always returned as bytes.

If this column is meant to represent Unicode text, set it to a text type instead of a binary one in the database. Then you won't have a problem in the first place—in this script, or in any other tools.

_{1. Actually, it appears to be illegal to call it with a list of anything but strings and numbers, according to the docs. But practically, it takes anything at all; it just doesn't do anything very useful with any types besides strings and numbers…}

_{2. That is, unless you explicitly pass it a use_unicode=False in the connect call or similarly set it elsewhere.}

(Python) Write string to CSV with UTF-8 Encoding

Answers (1)

Related Questions