tastyminerals
tastyminerals

Reputation: 6548

How do you make re.sub() understand unicode?

You need to remove/replace any single German character, for example ü.

import re
re.sub(r'^\w{1}$', '', u'ü', re.U)
> u'\xfc'

The above code does not work but why if:

re.U, re.UNICODE Make the \w, \W, \b, \B, \d, \D, \s and \S sequences dependent on the Unicode character properties database. Also enables non-ASCII matching for IGNORECASE.

Upvotes: 0

Views: 268

Answers (1)

user3850
user3850

Reputation:

re.sub() takes flags as the 5th argument, not the 4th. The 4th is count. This will work:

>>> re.sub(r'^\w$', '', u'ü', flags=re.U)
u''

Upvotes: 1

Related Questions