Reputation: 1038
I have the following regex:
res = re.finditer(r'(?:\w+[ \t,]+){0,4}my car',txt,re.IGNORECASE|re.MULTILINE)
for item in res:
print(item.group())
When I use this regex with the following string:
"my house is painted white, my car is red. A horse is galloping very fast in the road, I drive my car slowly."
I am getting the following results:
My question is about the quantifier {0,4}
that should apply to the whole group. The group collects words with the expression \w+
and some separation symbols with the [ ]. Does the the quantifier apply only to the "words" defined by \w+
? In the results I am getting 4 words plus space and comma. It's unclear to me.
Upvotes: 0
Views: 197
Reputation: 3568
So, here's what's happening. You're using ?: to make a non capture group, which collects 1 or more "words", followed by a [ \t,] (a space, tab char, or comma), match one or more of the preceeding. {0,4} matches between 0-4 of the non-capturing group. So it looks at the word "my car" and captures the 4 words before it, since all 4 of them match the \w+ and the , and space get eaten by the character set you specified.
Broken apart more succinctly
(?: -- Non capturing group
\w+ Grab all words
[ \t,]+ -- Grab all spaces, comma, or tab characters
) -- End capture group
{0,4} -- Match the previous capture group 0-4 times
my car -- Based off where you find the words "my car"
As a result this will match 0-4 words / spaces / commas / tabs before the appearance of "my car"
This is working as written
Upvotes: 1