How do you make re.sub() understand unicode?

Question

You need to remove/replace any single German character, for example ü.

import re
re.sub(r'^\w{1}$', '', u'ü', re.U)
> u'\xfc'

The above code does not work but why if:

re.U, re.UNICODE Make the \w, \W, \b, \B, \d, \D, \s and \S sequences dependent on the Unicode character properties database. Also enables non-ASCII matching for IGNORECASE.

user3850 · Accepted Answer

re.sub() takes flags as the 5th argument, not the 4th. The 4th is count. This will work:

>>> re.sub(r'^\w$', '', u'ü', flags=re.U)
u''

How do you make re.sub() understand unicode?

Answers (1)

Related Questions