wong2
wong2

Reputation: 35720

Python convert file content to unicode form

For example, I have a file a.js whose content is:

Hello, 你好, bye.  

Which contains two Chinese characters whose unicode form is \u4f60\u597d
I want to write a python program which convert the Chinese characters in a.js to its unicode form to output b.js, whose content should be: Hello, \u4f60\u597d, bye.

My code:

fp = open("a.js")
content = fp.read()
fp.close()

fp2 = open("b.js", "w")
result = content.decode("utf-8")
fp2.write(result)
fp2.close()  

but it seems that the Chinese characters are still one character , not an ASCII string like I want.

Upvotes: 2

Views: 6942

Answers (5)

hitrust
hitrust

Reputation: 101

There two ways you can use. first one, use 'encode' method

str1 = "Hello, 你好, bye. "
print(str1.encode("raw_unicode_escape"))
print(str1.encode("unicode_escape"))

Also you can use 'codecs' module:

import codecs
print(codecs.raw_unicode_escape_encode(str1))

Upvotes: 0

ibear
ibear

Reputation: 157

You can try codecs module

codecs.open(filename, mode[, encoding[, errors[, buffering]]])

a = codecs.open("a.js", "r", "cp936").read() # a is a unicode object

codecs.open("b.js", "w", "utf16").write(a)

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798526

>>> print u'Hello, 你好, bye.'.encode('unicode-escape')
Hello, \u4f60\u597d, bye.

But you should consider using JSON, via json.

Upvotes: 5

HYRY
HYRY

Reputation: 97261

you can use repr:

a = u"Hello, 你好, bye. "
print repr(a)[2:-1]

or you can use encode method:

print a.encode("raw_unicode_escape")
print a.encode("unicode_escape")

Upvotes: -1

wong2
wong2

Reputation: 35720

I found that repr(content.decode("utf-8")) will return "u'Hello, \u4f60\u597d, bye'"
so repr(content.decode("utf-8"))[2:-1] will do the job

Upvotes: -1

Related Questions