surya
surya

Reputation: 253

Reaching a middle ground between search() and findall() in regular expressions

A personal project requiring me to create regular expressions for IP addresses led me to the following standoff.

pattern = r'123\.145\.167\.[0-9]{1,2}'
source = "123.145.167.0, 123.145.167.99, 123.145.167.100"
n = re.search(pattern, source)
print n.group()


pattern = r'123\.145\.167\.[0-9]{1,2}'
source = "123.145.167.0, 123.145.167.99, 123.145.167.100"
n = re.compile(pattern)
print n.findall(source)

While using search matches only the first element in the source string, findall creates a problem by giving an output such as this

['123.145.167.0', '123.145.167.99', '123.145.167.10']

Is it possible that I can obtain the matches for both 123.145.167.0 and 123.145.167.99 and not the 123.145.167.100 ?

I have already gone thorough python - regex search and findall and yet not able to understand how I can solve my problem.

Upvotes: 2

Views: 83

Answers (3)

DevKeh
DevKeh

Reputation: 1

You would need to define a boundry for your match. 123.145.167.10 is within 123.145.167.100. You can use the \b tag to define a boundry.

r"\b123\.145\.167\.[0-9]{1,2}\b"

Upvotes: 0

6502
6502

Reputation: 114539

You can use a lookahead assertion:

pattern = r'123\.145\.167\.[0-9]{1,2}(?=[^0-9]|$)'

the part

(?=[^0-9]|$)

means that you just want to check if following there is either a non-numeric character or the string ends. This check will not "use" any char and will only influence if the expression matches or not. With this approach findall will provide the result you're looking for.

From the documentation:

(?=...) Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

Upvotes: 1

user764357
user764357

Reputation:

Throw a word boundary on the end: \b.

pattern = r'123\.145\.167\.[0-9]{1,2}\b'
source = "123.145.167.0, 123.145.167.99, 123.145.167.100"
n = re.compile(pattern)
print n.findall(source)

Gives:

['123.145.167.0', '123.145.167.99']

Upvotes: 1

Related Questions