showkey
showkey

Reputation: 298

how to properly use chinese character in regex in python?

I am in win7+python3.3,the cmd coding page is 936.

>>> import re
>>> if(re.search(r"仟|佰|千|百","百万")):print("ok1")
...
ok1
>>> if(re.search(u"仟|佰|千|百","百万")):print("ok2")
...
ok2

When i save it as the following in the g:\test_number.py.

# -*- coding: utf-8 -*- 
import re
if(re.search(r"仟|佰|千|百","百万")):print("ok1")
if(re.search(u"仟|佰|千|百","百万")):print("ok2")

and run it python g:\\test_number.py,i got the error:

C:\Windows\system32\cmd.exe /c (python \test_number.py)
File "\test_number.py", line 3
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xb0 in position 3:
 invalid start byte
shell returned 1
Hit any key to close this window...

what is the matter? When i change my code ,the same error too.

# -*- coding: utf-8 -*- 
import re
output=open("g://number","w",encoding="utf-8")
if(re.search(r"仟|佰|千|百","百万")):output.write("ok1")
if(re.search(u"仟|佰|千|百","百万")):output.write("ok2")
output.close()

enter image description here

Upvotes: 1

Views: 1115

Answers (1)

falsetru
falsetru

Reputation: 369274

Make sure your editor is configured to write the file using utf-8 encoding.

enter image description here

Upvotes: 2

Related Questions