andreSmol
andreSmol

Reputation: 1038

How are regex quantifiers applied?

I have the following regex:

res = re.finditer(r'(?:\w+[ \t,]+){0,4}my car',txt,re.IGNORECASE|re.MULTILINE)
for item in res:
    print(item.group())

When I use this regex with the following string:

"my house is painted white, my car is red. A horse is galloping very fast in the road, I drive my car slowly."

I am getting the following results:

My question is about the quantifier {0,4} that should apply to the whole group. The group collects words with the expression \w+ and some separation symbols with the [ ]. Does the the quantifier apply only to the "words" defined by \w+? In the results I am getting 4 words plus space and comma. It's unclear to me.

Upvotes: 0

Views: 197

Answers (1)

A_Elric
A_Elric

Reputation: 3568

So, here's what's happening. You're using ?: to make a non capture group, which collects 1 or more "words", followed by a [ \t,] (a space, tab char, or comma), match one or more of the preceeding. {0,4} matches between 0-4 of the non-capturing group. So it looks at the word "my car" and captures the 4 words before it, since all 4 of them match the \w+ and the , and space get eaten by the character set you specified.

Broken apart more succinctly

(?: -- Non capturing group
\w+ Grab all words
[ \t,]+ -- Grab all spaces, comma, or tab characters
) -- End capture group
{0,4} -- Match the previous capture group 0-4 times
my car -- Based off where you find the words "my car"

As a result this will match 0-4 words / spaces / commas / tabs before the appearance of "my car"

This is working as written

Upvotes: 1

Related Questions