Reputation: 3671
I have a system I'm building that inserts rows into a MySQL database through POST requests (API built in Flask/Python). Some of the rows have accents in them. Particularly I have a row that has the name Péter
in it. The output in the code when I do the SELECT for the DB in my code is P\xc3\xa9ter
. This has required me to do some work in regards to character encoding. When I do my GET request, I pull the data and attempt to output it as a JSON response but get this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
Other GET requests are fine on rows without accents so I've begun to isolate it down to that issue.
I am using an Amazon RDS instance as my MySQL database. By default, RDS instances are latin-1 encoded. I've gone in and updated my parameter groups and everything now seems to be utf-8 encoded. Here are my character and collation variables:
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.6.27.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
I rebooted the instance and even reloaded the entire database. As further clarification, I'm running this API locally on my MySQL database and it's working fine (which again leads me to think it's encoding because the entire database has been imported directly from my localhost version).
I'm not entirely sure what my next step would be to troubleshoot this. Is it possibly that it is saving it incorrectly into the DB? I don't do any encoding before I insert it into the DB. The character does show up as an é in the DB when I do a SELECT statement on it from the command line (should it be encoded somehow in the DB)?
Thanks for your help!
Upvotes: 1
Views: 1363
Reputation: 3671
For anyone else having this issue, I just had to set charset = 'utf8'
in my connection string (explicitly set the charset). I tried encoding strings in the code etc but this did the trick immediately.
Upvotes: 3