Regex, split and Python

Question

Can anybody help me to understand what those lines are doing ?

VAR_TOKEN_START = '{{'
VAR_TOKEN_END = '}}'
BLOCK_TOKEN_START = '{%'
BLOCK_TOKEN_END = '%}'
TOK_REGEX = re.compile(r"(%s.*?%s|%s.*?%s)" % (
    VAR_TOKEN_START,
    VAR_TOKEN_END,
    BLOCK_TOKEN_START,
    BLOCK_TOKEN_END
))

TOK_REGEX.split('{% each vars %}{{it}}{% endeach %}')

I don't understand the % on the regex expression. And why we split on TOK_REGEX variable expression.

user2357112 · Accepted Answer

This part:

TOK_REGEX = re.compile(r"(%s.*?%s|%s.*?%s)" % (
    VAR_TOKEN_START,
    VAR_TOKEN_END,
    BLOCK_TOKEN_START,
    BLOCK_TOKEN_END
))

uses string formatting to build a regex in a more understandable manner than just a jumble of characters. The % operator replaces each %s with the contents of the corresponding string in the following tuple. This allows the author of the code to give meaningful names to the {{, }}, {%, and %} parts of the regex.

The split call:

TOK_REGEX.split('{% each vars %}{{it}}{% endeach %}')

equivalent to the re.split function with the compiled pattern, finds all occurrences of text matching the regex in the argument string and returns a list of the parts divided by the matches - except that since the regex was in a capturing group (the parentheses in the regex string), the regex matches are also included in the list.

Regex, split and Python

Answers (1)

Related Questions