Alkodemik
Alkodemik

Reputation: 13

{m,n} does't match below m but matches above n. Why?

When I'm trying to match string length by regex \w{m,n} it doesn't match strings with length below m, as expected, but matches strings with length above n.

>>> expression = '\w{4,32}'
>>> string = 'a'*3
>>> print re.match(expression, string)
None
>>> string = 'a'*100
>>> output = re.match(expression, string)
>>> len(output.string)
100

Why is it happens in this way? How should I use it?

Upvotes: 1

Views: 100

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89614

You must use word boundaries:

>>> expression = '\b\w{4,32}\b'

a word boundary \b is the zero-width limit between a character from \w and an other character (not from \w, including start and end of the string)

Upvotes: 1

Mp0int
Mp0int

Reputation: 18737

You are expecting the input to be between 4 and 32 characters. But what about word-only strings that are longer than 32 characters? Your regex controls first 32 chars and do not care characters after 33rd. So 33rd character can be anything.

So:

expression = '\W*\w{4,32}\W*'

means your string may start with any non-word character (\W*) followed by words with a length of between 4 and 32 (\w{4,32}) and may contain any non-word character after that (\W*). * means 0 or more repetitions and it let you have word-only input between 4 an 32 characters

In your code, since your regex test passes, it gets the length of the entire string.

Upvotes: 0

Ry-
Ry-

Reputation: 225125

match matches, by default, at the beginning of the string – but it doesn’t also anchor to the end. The regular expression matches the first 32 as in the second case. I think you wanted:

expression = '^\w{4,32}$'

(The reason len(output.string) is still 100 is because it just refers to the string that the regular expression was matched against, not the part that it actually matched. You can use m.group(0) to find the real match.)

Upvotes: 4

Related Questions