Reputation: 35720
For example, I have a file a.js whose content is:
Hello, 你好, bye.
Which contains two Chinese characters whose unicode form is \u4f60\u597d
I want to write a python program which convert the Chinese characters in a.js to its unicode form to output b.js, whose content should be: Hello, \u4f60\u597d, bye
.
My code:
fp = open("a.js")
content = fp.read()
fp.close()
fp2 = open("b.js", "w")
result = content.decode("utf-8")
fp2.write(result)
fp2.close()
but it seems that the Chinese characters are still one character , not an ASCII string like I want.
Upvotes: 2
Views: 6942
Reputation: 101
There two ways you can use. first one, use 'encode' method
str1 = "Hello, 你好, bye. "
print(str1.encode("raw_unicode_escape"))
print(str1.encode("unicode_escape"))
Also you can use 'codecs' module:
import codecs
print(codecs.raw_unicode_escape_encode(str1))
Upvotes: 0
Reputation: 157
You can try codecs module
codecs.open(filename, mode[, encoding[, errors[, buffering]]])
a = codecs.open("a.js", "r", "cp936").read() # a is a unicode object
codecs.open("b.js", "w", "utf16").write(a)
Upvotes: 1
Reputation: 798526
>>> print u'Hello, 你好, bye.'.encode('unicode-escape')
Hello, \u4f60\u597d, bye.
But you should consider using JSON, via json
.
Upvotes: 5
Reputation: 97261
you can use repr:
a = u"Hello, 你好, bye. "
print repr(a)[2:-1]
or you can use encode method:
print a.encode("raw_unicode_escape")
print a.encode("unicode_escape")
Upvotes: -1
Reputation: 35720
I found that repr(content.decode("utf-8")) will return "u'Hello, \u4f60\u597d, bye'"
so repr(content.decode("utf-8"))[2:-1]
will do the job
Upvotes: -1