Reputation: 1371
This is the second time in a few weeks that I've been stuck on an encoding issue. I've spent such a long time on this problem already, and I'd appreciate any help I can get.
This is what I want to do:
1) Select some rows from a MySQL table on my computer.
2) Write these rows into a text file.
3) Transfer the text file over to my Amazon EC2 Ubuntu instance.
4) Write the contents of the text file into a MySQL database.
5) Get Django to select some rows from the database in #4.
6) Show on the website.
In step #1, I just had an ordinary SELECT statement. In step #2, I did this:
file = codecs.open('commentsfordjango.txt', encoding = 'utf-8', mode='w')
file.write(fullcomment.decode('utf8') + '\n\n\n\n\n\n')
After step #2, I opened the .txt file in Windows and I could see all the actual Chinese characters without any error.
In step #3, I just transferred the file using WinSCP. In step #4, I did this:
file = open('/usr/local/src/blog/commentsfordjango.txt', 'r')
cursor.execute("INSERT INTO polls_poll (commenttext, pos, neu, neg) VALUES (%s, 0, 0, 0)", line)
In step #5, I did this in views.py
: I simply returned the object which corresponded to the model. My model has a unicode function but I did not call that as I read that by default, it is already called when you call your object.
In step #6, my HTML file has the following line at the top of the file:
<meta charset="utf-8" />
Also, I changed my Apache encoding default to Unicode. I also made sure that my SQL database in step #4 is in Unicode.
However, after all this, my website still shows a bunch of unreadable, weird characters as such: 人在åšï¼Œå¤©åœ¨çœ‹ã€.
Any help will be very much appreciated - I've tried so many variations involving .decode() and .encode('utf-8') and spent far too long on this problem already!
Upvotes: 1
Views: 277
Reputation: 13624
A big part of the problem is probably that you're manually inserting items into the database instead of using Django's database ORM. The ORM takes care of all the encoding/decoding, making sure that you get good unicode out of the database, whatever the encoding used inside the database itself.
So: are you really sure you're inserting the right encoding into the database? You should probably do a quick test with the ORM. Make sure you read the file the correct way with codecs.open()
(which you seem to be doing) and stuff it into django models and save them.
Upvotes: 0
Reputation: 38382
In Step #2, you should to encode your text as UTF-8.
with open("commentsfordjango.txt", "wb") as f:
f.write(fullcomment.encode('utf8'))
In Step #3, you can then decode the data you read from the file back into unicode.
with open("commentsfordjango.txt", "rb") as f:
for line in f.read().decode("utf-8").splitlines():
cursor.execute("INSERT INTO polls_poll (commenttext, pos, neu, neg) VALUES (%s, 0, 0, 0)", line)
A better solution would be to just use Django's built-in loaddata/dumpdata facilities.
Upvotes: 1