Reputation: 99
I am using Beautifulsoup to scrape Chinese text from a Chinese website, and I tried to insert the string I scraped into mysql database through MySQLdb in python. But I encountered UnicodeEncodeError when I execute the query. The code is as the following:
movie_name_fail = my_beautifulsoup_object.find("div").text
my_cursor.execute("INSERT INTO MOVIE_TABLE VALUES(%s)",movie_name_fail)
It gives me the error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-7: ordinal not in range(256)
But when I do
print movie_name_fail
The Chinese characters can be printed out corrrectly. And I have already declared
#!/usr/bin/python
# -*- coding: utf-8 -*-
as the encoding of my python source file, but it did not work. However, when I typed the same Chinese characters directly into my text editor(I am using sublime text), it worked pretty well and I am able to insert it into mysql and display it in mysql console correctly(I have already set the CHARACTER SET of the table in mysql to be utf8):
movie_name_success = "超人总动员"
my_cursor.execute("INSERT INTO MOVIE_TABLE VALUES(%s)",movie_name_success)
I could not figure out why the bug happened and how it worked. I would really appreciate any help.
Update
My python version is 2.7.8, and the MYSQL version is 5.7.11
I pushed my source code to github which should be able to reproduce the error on line 117: "db_cursor.executemany(insert_sql,movie_tuple_list) "
https://github.com/shawnli2010/JHSaver/blob/master/LeTV_scraper.py
Upvotes: 0
Views: 1729
Reputation: 142258
Does that Python construct add quotes when doing the substitution? It needs to.
Did you establish utf8mb4 for the connection?
Is the table/column CHARACTER SET utf8mb4
?
I suggest utf8mb4 instead of utf8 because Chinese has some characters that need 4 bytes.
Upvotes: 1