Reputation: 15588
Is there any way to combine groups and the * features of regular expressions to act kindof like a tokenizer / splitter. I tried this:
my_str = "foofoofoofoo"
pattern = "(foo)*"
result = re.search(pattern, my_str)
I was hoping my groups might look like
("foo", "foo", "foo", "foo")
But it does not. I was surprised by this because the ? and group features do work together:
my_str= "Mr foo"
pattern = "(Mr)? foo"
result = re.search(pattern, my_str)
Upvotes: 5
Views: 1816
Reputation: 92976
The problem is you repeat your only capturing group. That means you have only one bracket ==> one capturing group, and this capturing group is overwritten each time when it matches.
See Repeating a Capturing Group vs. Capturing a Repeated Group on regular-expression.info for more information. (But capturing a repeated group is also not what you want)
So, after your regex is done, your capturing group 1 will contain the last found "foo".
This would would give you the expected result:
my_str = "foofoofoofoo"
pattern = "foo"
result = re.findall(pattern, my_str)
result is then a list ['foo', 'foo', 'foo', 'foo']
Upvotes: 4
Reputation: 142136
Capture groups and * don't work with the built in re module -- use findall instead.
There is a library called regex in pypi that I believe supports that syntax and has a few other features such as variable length back tracking.
Upvotes: 3