Myjab
Myjab

Reputation: 944

Ip parsing using regexp in python

I am trying to parse IP address from a string:

>>> import re
>>> input_str = '''
kjhdkjfh shfkjdsh shfk 1.1.1.1 kaseroi 1.1.1.1 jsoiu 1.1.1.1 
1
1
11
123
132132.23213.213213.123213
23.23.23.23 2321321.33.3.3.3 3.3..3.3.3.3.3. 
3.3.3.3.3.3

3.3.3.3
34.5.6.7
agdi 123213.44.4.5 12.12.12.12
'''
>>> 
>>> 
>>> pattern = r"\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\b"
>>> re.findall(pattern, input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '33.3.3.3', '3.3.3.3', '3.3.3.3', '3.3.3.3', '34.5.6.7', '12.12.12.12']
>>>

But the valid IP list is:

['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']

Is there anything wrong with regex?

Upvotes: 1

Views: 85

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174826

You just need to add negative lookahead and lookbehind in your pattern.

(?<!\.)\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\b(?!\.\d?)

DEMO

OR

(?<!\S)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?!\S)

DEMO

  • (?<!\S) Negative lookbehind asserts that (what or character) precedes the match would be any but not a non-space character.
  • (?!\S) Negative lookahead asserts that what follows the match would be any but not a non-space character.

Code:

>>> re.findall(r'(?<!\S)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?!\S)', input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']

Upvotes: 3

nu11p01n73R
nu11p01n73R

Reputation: 26667

You cannot use \b to limit the regex because . is included within the \b. From the input string we can notice that the ips are delimited using space hence \s is a much better option.

Changing the regex with a lookarounds for \s would serve the pupose

>>> attern = r"(?<=\s)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?=\s)"
>>> re.findall(attern, input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']

Upvotes: 1

Related Questions