Reputation: 369
I am trying to use the re.UNICODE flag to match a string potentially containing unicode characters, but it doesn't seem to be working. E.g.:
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u"test test test", re.UNICODE)
[]
It works if I do not specify the unicode flag, but then obviously it will not work with unicode strings. What do I need to do to get this working?
Upvotes: 3
Views: 871
Reputation: 59156
The second argument to r.findall
is not flags, but pos
. You don't need to specify flags again when you already specified them in compile
.
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u'test test test')
[u'test', u'test', u'test']
Upvotes: 6