Reputation: 20831
So basically I am still pretty new to Python and I have a problem where I have a document in Japanese that I am trying to convert to a utf-8 encoded document. I don't really know what I should be getting in return when I do this. When i run the program I currently have, it just deletes everything and leaves me with a blank utf-8 encoded document. Here is what I have, any help is greatly appreciated.
EDIT: I'm sorry it was a typo, I fixed the original encoding. It is Shift-jis.
import codecs
codecs.open("rshmn10j.txt", 'r', encoding='shift-jis')
newfile = codecs.open("rshmn10j.txt", 'w', encoding='utf-8')
newfile.write(u'\ufeff')
newfile.close()
Upvotes: 0
Views: 447
Reputation: 43850
if you're trying to convert a document from encoding "x" to encoding "utf8", you first have to read the document using the encoding it is encoded in.
import codecs
original_document_encoding = "shift-jis" # common japanese encoding.
with codecs.open("rshmn10j.txt", 'r', encoding=original_document_encoding) as in_f:
unicode_content = in_f.read()
with codecs.open("rshmn10j.out.txt", 'w', encoding='utf-8') as out_f:
out_f.write(unicode_content)
with
is used here to auto-close the file when the block is exited.
Upvotes: 2