Reputation: 59

why python return the following?

p = re.compile('x*')
print(p.search('abxd'))

output:

<re.Match object; span=(0, 0), match=''>

p = re.compile('x+')
print(p.search('abxd'))

output:

<re.Match object; span=(2, 3), match='x'>

Upvotes: 2

Answers (2)

Josh Honig

Reputation: 187

What you're seeing is Python (more specifically, the Regex module) returning a re.Mach object. This object has methods and classes that you can call to get the results you want.

For example, if you just want the match as a string, this code would print it:

>>> expression = re.compile('.+')
>>> result = expression.search('abcd')
>>> print(result.group())
'abcd'

Your current code (the first example) also contains an expression that does not match the input string, which may be the source of some confusion. In the example I've provided, I used .+, which will match any character zero to infinity times. Regex101.com does a wonderful job of helping created Regex expressions and understanding the syntax.

Here's the documentation for the Regex Match object (what you're getting in your current code), and specifically, here is the documentation for the .group() method.

Quick note:

You don't need to compile an expression before calling the search function (however, this is entirely a personal preference, but it is more commonly used; there are valid reasons to compile the expression before calling the search function but that's out of the scope of answering this question). The two following code blocks will do the exact same thing:

expression = re.compile('.+')
print(expression.search('string'))

print(re.search(r'.+', 'string'))

In the second block, the expression is the first parameter in the search function, with a r in front of it, indicating that it is a regex expression.

Hope this helps!

Upvotes: 3

Hugo G

Reputation: 16494

As the docs for re.search() say:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

(emphasis mine)

Your first RegEx x* matches zero or more of character x. Of all the matches, the first one is returned. Your pattern matches the beginning of your string, because it's zero 'x's. Hence your match starts and ends at position 0 (<re.Match object; span=(0, 0), match=''>).

When you search for x+ that means one or more of character x. The only x in your string is at position 2 (third character, but we start counting at 0). It's one character long, so it's ending at position 3. Hence your result <re.Match object; span=(2, 3), match='x'>.

If you looked through all matches rather than just the first one, you'd see other matches too! You can do that e.g. using re.findall()

Example:

>>> re.findall(r'x+', 'abxb')
['x']
>>> re.findall(r'x*', 'abxb')
['', '', 'x', '', '']

As you can see, matching zero or more means we match all the non-existing characters between our letters too! This feature of zero or more is much more useful when combined with other patterns, i.e. if we want to say that a character or word is optional in our match. Let's say we wanted to match all bs that are followed by zero or more xes:

>>> re.findall(r'bx*', 'abxb')
['bx', 'b']

Upvotes: 2

why python return the following?

Answers (2)

Quick note:

Related Questions