Reputation: 15642
I want to retrieve some data from a dbase. All the tables in it have the utf8_general_ci collation.
By the way, this is a .cgi file, so it is executed by means of an Ajax call.
I'm doing this to make the connection:
#!/home/mike/python_venvs/test_venv369/bin/python
...
conn = mysql.connector.connect( host='', database='test_kernel',
user='root', password='root',
charset='utf8', use_unicode=True )
...
query = ("SELECT * from invoices limit 2")
cursor.execute( query )
for x in cursor:
print( type( x )) # is a tuple, i.e. the row
for y in x:
print( type( y ) ) # the problem field prints "str"
if type( y ) == 'str':
y = y.encode( 'utf-8')
print( y )
On the encoding line above I get:
<class 'UnicodeEncodeError'> 'ascii' codec can't encode character '\xa3' in position 0: ordinal not in range(128)
With all the permutations I've tried I get the same thing. '\xa3', by the way, is the '£' character, non-ASCII.
I've tried many different approaches, found mainly here in SO: encode, decode, ... Nothing seems to work. I thought the str
type was Python 2... but this is definitely a Python3 program, something which I actually checked with sys.version_info[ 0 ]
!
Upvotes: 1
Views: 254
Reputation: 15642
Thanks to the help of snakecharmerb's comments, which then led me to this answer, I found a solution which works:
import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
I think this constitutes a workaround, and it'd be great if anyone could explain how this setting for locale.getpreferredencoding()
gets to be set at ASCII/ANSI_X3.4-1968 ... even better if they could then say how to set it to something else.
The culprit is probably Apache, though I'm far from sure.
The question referenced by snakecharmerb unfortunately did not provide a solution for me: I added (or rather uncommented) the following line in /etc/apache2/conf-enabled/charset.conf
AddDefaultCharset UTF-8
... and restarted Apache. No change.
Edit
Output from various settings for su
which might be involved:
M17A ~ # locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
M17A ~ # echo $LANG
en_GB.UTF-8
M17A ~ # locale charmap
UTF-8
I believe it is su
/root which is indeed running the Apache process.
Edit 2
I thought I'd look into the ownership of the processes on my machine. So I ran ps aux
. Some possibly relevant processes came up which are not owned by me or by root:
USER # i.e. owner
...
mysql 1413 0.0 0.1 1419400 16760 ? Ssl May15 0:50 /usr/sbin/mysqld
...
www-data 5825 0.0 0.0 143296 5536 ? S 07:35 0:00 /usr/sbin/apache2 -k start
www-data 5826 0.0 0.1 298492 21900 ? S 07:35 0:00 /usr/sbin/apache2 -k start
www-data 5827 0.0 0.1 298096 18700 ? S 07:35 0:00 /usr/sbin/apache2 -k start
www-data 5828 0.0 0.0 296044 15872 ? S 07:35 0:00 /usr/sbin/apache2 -k start
www-data 5829 0.0 0.1 296040 16876 ? S 07:35 0:00 /usr/sbin/apache2 -k start
www-data 5830 0.0 0.0 296052 7972 ? S 07:35 0:00 /usr/sbin/apache2 -k start
...
www-data 9636 0.0 0.0 296052 7856 ? S 08:16 0:00 /usr/sbin/apache2 -k start
www-data 9639 0.0 0.0 295572 6324 ? S 08:16 0:00 /usr/sbin/apache2 -k start
www-data 9640 0.0 0.0 295572 6324 ? S 08:16 0:00 /usr/sbin/apache2 -k start
www-data 9641 0.0 0.0 295572 6324 ? S 08:16 0:00 /usr/sbin/apache2 -k start
Maybe one of these owners is using this ASCII encoding? I wonder how I might find out...
Upvotes: 1