mike rodent
mike rodent

Reputation: 15642

Python encoding issue with mysql.connector - fetching data

I want to retrieve some data from a dbase. All the tables in it have the utf8_general_ci collation.

By the way, this is a .cgi file, so it is executed by means of an Ajax call.

I'm doing this to make the connection:

#!/home/mike/python_venvs/test_venv369/bin/python
...

conn = mysql.connector.connect( host='', database='test_kernel',
                                user='root', password='root',
                                charset='utf8', use_unicode=True )
...
query = ("SELECT * from invoices limit 2")
cursor.execute( query )

for x in cursor:
    print( type( x  )) # is a tuple, i.e. the row
    for y in x:
        print( type( y ) ) # the problem field prints "str"
        if type( y ) == 'str':
            y = y.encode( 'utf-8')
        print( y )

On the encoding line above I get:

<class 'UnicodeEncodeError'> 'ascii' codec can't encode character '\xa3' in position 0: ordinal not in range(128)

With all the permutations I've tried I get the same thing. '\xa3', by the way, is the '£' character, non-ASCII.

I've tried many different approaches, found mainly here in SO: encode, decode, ... Nothing seems to work. I thought the str type was Python 2... but this is definitely a Python3 program, something which I actually checked with sys.version_info[ 0 ]!

Upvotes: 1

Views: 254

Answers (1)

mike rodent
mike rodent

Reputation: 15642

Thanks to the help of snakecharmerb's comments, which then led me to this answer, I found a solution which works:

import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)

I think this constitutes a workaround, and it'd be great if anyone could explain how this setting for locale.getpreferredencoding() gets to be set at ASCII/ANSI_X3.4-1968 ... even better if they could then say how to set it to something else.

The culprit is probably Apache, though I'm far from sure.

The question referenced by snakecharmerb unfortunately did not provide a solution for me: I added (or rather uncommented) the following line in /etc/apache2/conf-enabled/charset.conf

AddDefaultCharset UTF-8

... and restarted Apache. No change.

Edit
Output from various settings for su which might be involved:

M17A ~ # locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
M17A ~ # echo $LANG
en_GB.UTF-8
M17A ~ # locale charmap
UTF-8

I believe it is su/root which is indeed running the Apache process.

Edit 2
I thought I'd look into the ownership of the processes on my machine. So I ran ps aux. Some possibly relevant processes came up which are not owned by me or by root:

USER # i.e. owner
...
mysql     1413  0.0  0.1 1419400 16760 ?       Ssl  May15   0:50 /usr/sbin/mysqld
...
www-data  5825  0.0  0.0 143296  5536 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5826  0.0  0.1 298492 21900 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5827  0.0  0.1 298096 18700 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5828  0.0  0.0 296044 15872 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5829  0.0  0.1 296040 16876 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5830  0.0  0.0 296052  7972 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
...
www-data  9636  0.0  0.0 296052  7856 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9639  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9640  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9641  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start

Maybe one of these owners is using this ASCII encoding? I wonder how I might find out...

Upvotes: 1

Related Questions