user3369157
user3369157

Reputation: 137

Regular expression result

I have below code:

import re

line = "78349999234";

searchObj = re.search(r'9*', line)

if searchObj:
   print "searchObj.group() : ", searchObj.group()
else:
   print "Nothing found!!"

However the output is empty. I thought * means: Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s. Why am I not able to see any result in this case?

Upvotes: 5

Views: 66

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

The main reason is , re.search function stops searching for strings once it finds a match. 9* means match the digit 9 zero or more times. Because an empty string exists before each and every character, re.search function stops it searching after finding the first empty string. That's why you got an empty string as output...

Upvotes: 1

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476594

I think the regular expression matches left to right. So the first pattern that matches is the empty string before 7.... If it find a 9, it will indeed match it greedy: and try to "eat" (that's the correct terminology) as many characters as possible.

If you query for:

>>> print(re.findall(r'9*',line));
['', '', '', '', '9999', '', '', '', '']

It matches all empty strings between the characters and as you can see, 9999 is matched as well.

The main reason is probably performance: if you search for a pattern in a string of 10M+ characters, you're very happy if the pattern is already in the first 10k characters. You don't want to waste effort on finding the "nicest" match...


EDIT

With 0 or more occurrence one means the group (in this case 9) is repeated zero or more times. In an empty string, the characters is repeated exactly 0 times. If you want to match patterns where the characters is repeated one or more times, you should use

9+

This results in:

>>> print(re.search(r'9+', line));
<_sre.SRE_Match object; span=(4, 8), match='9999'>

re.search for a pattern that accepts the empty string, is probably not that much helpful since it will always match the empty string before the actual start of the string first.

Upvotes: 5

Related Questions