Reputation: 2544
The RE only catches the last group if the group number is unknown (>=0):
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b %}")
[('a', 'b')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c %}")
[('a', 'c')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c e %}")
[('a', 'e')]
How to grap all groups like this (I imagine):
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b %}")
[('a', 'b')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c %}")
[('a', 'b', 'c')]
>>> re.findall(r"{% url '(\w+)'(?:\s+(\w+))* %}","{% url 'a' b c e %}")
[('a', 'b', 'c', 'e')]
Note , I this is the simple situation which is easy to understand my quesion. So solutions such like s.split() doesn't work for complicate one.
My real need is (Note the whitespace number is unknown(>=1)):
grab ["'funcname'", 'first'] from "{% url 'funcname' first %}"
grab ["'funcname'", 'first', 'second'] from "{% url 'funcname' first second %}"
grab ["'funcname'", 'first', 'second','third'] from "{% url 'funcname' first second third %}"
Or more complicated:
grab ["'funcname'", 'first','fir'] from "{% url 'funcname' first = fir %}"
grab ["'funcname'", 'first','fir', 'second', 'sec'] from "{% url 'funcname' first=fir second = sec %}"
grab ["'funcname'", 'first','fir', 'second', 'sec', 'third', 'thi'] from "{% url 'funcname' first =fir second = sec third=thi %}"
Upvotes: 0
Views: 47
Reputation: 1122222
You put a multiplier around the group:
(?:\s+(\w+))*
but groups do not multiply; they have a fixed group number and every match is assigned to that group number. Hence you see only ever the last match.
You'll have to capture all candidates in one group and split afterwards:
[r[:1] + tuple(r[1].split())
for r in re.findall(r"{% url '(\w+)'((?:\s+\w+)*) %}", inputtext)]
Note that the capturing group now captures all of the (?:\s+\w+)*
pattern.
Demo:
>>> import re
>>> inputtext = "{% url 'a' b c e %}"
>>> [r[:1] + tuple(r[1].split())
... for r in re.findall(r"{% url '(\w+)'((?:\s+\w+)*) %}", inputtext)]
[('a', 'b', 'c', 'e')]
Your second form is more complex, and requires that you use another regular expression to split out the matches:
from itertools import chain
[r[:1] + tuple(chain(*re.findall(r'(\w+)\s*=\s*(\w+)', r[1])))
for r in re.findall(r"{% url '(\w+)'((?:\s+\w+\s*=\s*\w+)*) \s*%}", inputtext)]
Demo:
>>> inputtext = "{% url 'funcname' first =fir second = sec third=thi %}"
>>> [r[:1] + tuple(chain(*re.findall(r'(\w+)\s*=\s*(\w+)', r[1])))
... for r in re.findall(r"{% url '(\w+)'((?:\s+\w+\s*=\s*\w+)*) \s*%}", inputtext)]
[('funcname', 'first', 'fir', 'second', 'sec', 'third', 'thi')]
Upvotes: 1