python 3.3 RE How to grab possible groups?

Question

The RE only catches the last group if the group number is unknown (>=0):

>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b %}")
[('a', 'b')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c %}")
[('a', 'c')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c e %}")
[('a', 'e')]

How to grap all groups like this (I imagine):

>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b %}")
[('a', 'b')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c %}")
[('a', 'b', 'c')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c e %}")
[('a', 'b', 'c', 'e')]

Note , I this is the simple situation which is easy to understand my quesion. So solutions such like s.split() doesn't work for complicate one.

My real need is (Note the whitespace number is unknown(>=1)):

grab ["'funcname'", 'first'] from "{% url 'funcname'    first   %}"
grab ["'funcname'", 'first', 'second'] from "{% url 'funcname'  first    second %}"
grab ["'funcname'", 'first', 'second','third'] from "{% url 'funcname'  first second     third    %}"

Or more complicated:

grab ["'funcname'", 'first','fir'] from "{% url 'funcname'    first = fir   %}"
grab ["'funcname'", 'first','fir', 'second', 'sec'] from "{% url 'funcname'  first=fir    second   = sec %}"
grab ["'funcname'", 'first','fir', 'second', 'sec', 'third', 'thi'] from "{% url 'funcname'  first =fir    second = sec    third=thi    %}"

Martijn Pieters · Accepted Answer

You put a multiplier around the group:

(?:\s+(\w+))*

but groups do not multiply; they have a fixed group number and every match is assigned to that group number. Hence you see only ever the last match.

You'll have to capture all candidates in one group and split afterwards:

[r[:1] + tuple(r[1].split()) 
 for r in re.findall(r"{% url '(\w+)'((?:\s+\w+)*) %}", inputtext)]

Note that the capturing group now captures all of the (?:\s+\w+)* pattern.

Demo:

>>> import re
>>> inputtext = "{% url 'a' b c e %}"
>>> [r[:1] + tuple(r[1].split()) 
...  for r in re.findall(r"{% url '(\w+)'((?:\s+\w+)*) %}", inputtext)]
[('a', 'b', 'c', 'e')]

Your second form is more complex, and requires that you use another regular expression to split out the matches:

from itertools import chain

[r[:1] + tuple(chain(*re.findall(r'(\w+)\s*=\s*(\w+)', r[1])))
 for r in re.findall(r"{% url '(\w+)'((?:\s+\w+\s*=\s*\w+)*) \s*%}", inputtext)]

Demo:

>>> inputtext = "{% url 'funcname'  first =fir    second = sec    third=thi    %}"
>>> [r[:1] + tuple(chain(*re.findall(r'(\w+)\s*=\s*(\w+)', r[1])))
...  for r in re.findall(r"{% url '(\w+)'((?:\s+\w+\s*=\s*\w+)*) \s*%}", inputtext)]
[('funcname', 'first', 'fir', 'second', 'sec', 'third', 'thi')]

python 3.3 RE How to grab possible groups?

Answers (1)

Related Questions