Webair
Webair

Reputation: 105

why regex is not greedy here

The following code works as I expected. But I have one questions:

import re

names_email="Harry Rogers    [email protected]"

name_match=re.compile("([\w\s]*)(\s)([\w.]*@[\w.]*)")
name=re.search(name_match,names_email)
print (name.group(3))
print(name.group(1))

[email protected]
Harry Rogers   

But why ([\w\s]*) is not matching upto Harry Rogers being greedy ? Why it is trying to match best possible for ([\w\s]*)(\s)

Upvotes: 0

Views: 76

Answers (2)

revo
revo

Reputation: 48761

But why ([\w\s]*) is not matching upto Harry Rogers being greedy ?

It doesn't include four spaces after Rogers in first capturing group because a space character must be matched in another group after being satisfied with first pattern.

This means [\w\s]* will match up to @ character then backtracks to match a space character which is right after h in harri. Leaving first capturing group with Harry Rogers (three space characters).

Upvotes: 1

Menglong Li
Menglong Li

Reputation: 2255

It's because (\s) indicates it only matches one space if you want group(1) to only match the "Harry Rogers" without tailing space, the codes should looks like this:

import re

names_email = "Harry Rogers    [email protected]"

name_match = re.compile("([\w\s]*?)([\s]+)([\w.]*@[\w.]*)")
name = re.search(name_match, names_email)
print(name.groups())

Upvotes: 0

Related Questions