Reputation: 405
I'm trying to convert my Python 2 script to Python 3. How do we do Regex with Unicode?
This is what I had in Python 2 which works It replaces quotes to « and »:
text = re.sub(ur'"(.*?)"', ur'«\1»', text)
I have some really complex ones which the "ur" made it so easy. But it doesn't work in Python 3:
text = re.sub(ur'ه\sایم([\]\.،\:»\)\s])', ur'ه\u200cایم\1', text)
Upvotes: 2
Views: 1758
Reputation: 12158
Since Python 3.0, the language features a str type that contain Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.
Unicode HOWTO This doc will help you.
so, you just do want every you do in Python2, and it will works, no extra effects.
Upvotes: 0
Reputation: 14519
All strings in Python3 are unicode by default. Just remove the u
and you should be fine.
In Python2 strings are lists of bytes by default, so we use u
to mark them as unicode strings.
Upvotes: 4