Omer Shapira
Omer Shapira

Reputation: 33

Tornado request handler mapping to international characters

I want to be able to match URL requests for some internationalized characters, like /Comisión. This is my setup:

class Application(tornado.web.Application):
    def __init__(self):
        handlers = [ 
            '''some handlers, and then this: '''
            (r"/([\w\:\,]+)", InternationalizedHandler)
            ]
            tornado.web.Application.__init__(self, handlers, **settings)

But setting locales in Tornado doesn't seem to be the right solution. How is it possible to set up the regex to catch characters such as é,å,µ etc.? Will changing the re mode in python do?

Upvotes: 3

Views: 404

Answers (2)

Brendan Berg
Brendan Berg

Reputation: 1938

TL;DR: It's impossible to do with Tornado's built-in router.

Tornado buries the regexp compiling for handler patterns pretty deep, so @stema's suggestion to use the re.Unicode flag is difficult, because it's not immediately clear where to pass in the flag. There are two ways to tackle that particular problem: subclass URLSpec and override the __init__ function, or put a flag prefix in the pattern.

The first option is a lot of work. The second option takes advantage of a feature in Python's re module in which patterns may specify (?u) at the beginning of the pattern instead of passing in the re.UNICODE flag as a parameter.

Unfortunately, neither option will work since Tornado matches patterns against the request URL before percent-decoding it into the unicode string. Therefore, compiling the pattern with the Unicode flag has no effect since you're matching against percent-encoded ASCII URLs, not Unicode strings.

Upvotes: 3

Firas Dib
Firas Dib

Reputation: 2621

If you look here you see what your expression "means": http://regex101.com/r/zO9zC8

If you want to match é,å,µ, you need to match the inverse of a-zA-Z0-9, which would be [^a-zA-Z0-9]. Seeing as how you used \w prior, you may aswell use \W which is the same as [^\w].

Good luck!

Edit: Re-reading your question I suggest you follow @stemas answer instead.

Upvotes: 1

Related Questions