How do I use re.UNICODE in python 2.7?

Question

I am trying to use the re.UNICODE flag to match a string potentially containing unicode characters, but it doesn't seem to be working. E.g.:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u"test test test", re.UNICODE)
[]

It works if I do not specify the unicode flag, but then obviously it will not work with unicode strings. What do I need to do to get this working?

khelwood · Accepted Answer

The second argument to r.findall is not flags, but pos. You don't need to specify flags again when you already specified them in compile.

>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u'test test test')
[u'test', u'test', u'test']

How do I use re.UNICODE in python 2.7?

Answers (1)

Related Questions