Reputation: 33
I want to be able to match URL requests for some internationalized characters, like /Comisión
. This is my setup:
class Application(tornado.web.Application):
def __init__(self):
handlers = [
'''some handlers, and then this: '''
(r"/([\w\:\,]+)", InternationalizedHandler)
]
tornado.web.Application.__init__(self, handlers, **settings)
But setting locales in Tornado doesn't seem to be the right solution. How is it possible to set up the regex to catch characters such as é,å,µ etc.? Will changing the re
mode in python do?
Upvotes: 3
Views: 404
Reputation: 1938
TL;DR: It's impossible to do with Tornado's built-in router.
Tornado buries the regexp compiling for handler patterns pretty deep, so @stema's suggestion to use the re.Unicode
flag is difficult, because it's not immediately clear where to pass in the flag. There are two ways to tackle that particular problem: subclass URLSpec
and override the __init__
function, or put a flag prefix in the pattern.
The first option is a lot of work. The second option takes advantage of a feature in Python's re
module in which patterns may specify (?u)
at the beginning of the pattern instead of passing in the re.UNICODE
flag as a parameter.
Unfortunately, neither option will work since Tornado matches patterns against the request URL before percent-decoding it into the unicode string. Therefore, compiling the pattern with the Unicode flag has no effect since you're matching against percent-encoded ASCII URLs, not Unicode strings.
Upvotes: 3
Reputation: 2621
If you look here you see what your expression "means": http://regex101.com/r/zO9zC8
If you want to match é,å,µ
, you need to match the inverse of a-zA-Z0-9
, which would be [^a-zA-Z0-9]
. Seeing as how you used \w
prior, you may aswell use \W
which is the same as [^\w]
.
Good luck!
Edit: Re-reading your question I suggest you follow @stemas answer instead.
Upvotes: 1