Writing a babel message parser and file extension

Question

I'd like to get babel parsing a file and find out translation strings that are simply starting with:

(_

and ending to:

So the translation part in the file.myext could be:

(_ "message")

String literals are always starting and ending with double quote (").

There are some references of doing it on: http://babel.pocoo.org/en/latest/messages.html?highlight=parser

But this seems overwhelmingly complicated thing. Can someone provide a simple example to achieve own message extractor for babel with above constrains?

I can find Jinja2 parser from: https://github.com/pallets/jinja/blob/99498320871a290f5799d4f96a7774fc8a34381e/jinja2/ext.py

But huh?!

Also Django project has their own extractor: https://github.com/python-babel/django-babel/blob/master/django_babel/extract.py

jwhitlock · Accepted Answer

The reason these appear complex is because they use lexical analysis (aka "lexers") to parse the inputs and find the strings. This may seem overly complicated, but it's a very mature area of computer science, and the right tool for the job. Most beginners start with regular expressions and custom code for this kind of task, and, if they persist and learn from what's available, end up with a lexer and parser.

For your own definition, you are looking for:

An open parenthesis (
An underscore _
Zero or more whitespace characters (space, newline, tab, etc.)
A double quote "
Some text that you want to extract
A end quote "
Zero or more whitespace characters
A end parenthesis )

This is a great problem for the many lexing / parsing libraries in Python and will be a perfect way to introduce yourself to this technology.

You'll also want to consider some other cases:

(_ 'single quotes')
(_ '''multi
      line
      quotes''')
(_ "strings with \"escaped quotes\".")
(_ "strings with 'mixed quotes'")
(_ "strings that are just wrong')

Writing a babel message parser and file extension

Answers (1)

Related Questions