Amin Persia
Amin Persia

Reputation: 405

Unicode Regex in Python 3 (from Python 2 Code)

I'm trying to convert my Python 2 script to Python 3. How do we do Regex with Unicode?

This is what I had in Python 2 which works It replaces quotes to « and »:

text = re.sub(ur'"(.*?)"', ur'«\1»', text)

I have some really complex ones which the "ur" made it so easy. But it doesn't work in Python 3:

text = re.sub(ur'ه\sایم([\]\.،\:»\)\s])', ur'ه\u200cایم\1', text)

Upvotes: 2

Views: 1758

Answers (2)

宏杰李
宏杰李

Reputation: 12158

Since Python 3.0, the language features a str type that contain Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

Unicode HOWTO This doc will help you.

so, you just do want every you do in Python2, and it will works, no extra effects.

Upvotes: 0

Drathier
Drathier

Reputation: 14519

All strings in Python3 are unicode by default. Just remove the u and you should be fine.

In Python2 strings are lists of bytes by default, so we use u to mark them as unicode strings.

Upvotes: 4

Related Questions