Reputation: 944
I am trying to parse IP address from a string:
>>> import re
>>> input_str = '''
kjhdkjfh shfkjdsh shfk 1.1.1.1 kaseroi 1.1.1.1 jsoiu 1.1.1.1
1
1
11
123
132132.23213.213213.123213
23.23.23.23 2321321.33.3.3.3 3.3..3.3.3.3.3.
3.3.3.3.3.3
3.3.3.3
34.5.6.7
agdi 123213.44.4.5 12.12.12.12
'''
>>>
>>>
>>> pattern = r"\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\b"
>>> re.findall(pattern, input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '33.3.3.3', '3.3.3.3', '3.3.3.3', '3.3.3.3', '34.5.6.7', '12.12.12.12']
>>>
But the valid IP list is:
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']
Is there anything wrong with regex?
Upvotes: 1
Views: 85
Reputation: 174826
You just need to add negative lookahead and lookbehind in your pattern.
(?<!\.)\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\b(?!\.\d?)
OR
(?<!\S)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?!\S)
(?<!\S)
Negative lookbehind asserts that (what or character) precedes the match would be any but not a non-space character.(?!\S)
Negative lookahead asserts that what follows the match would be any but not a non-space character.Code:
>>> re.findall(r'(?<!\S)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?!\S)', input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']
Upvotes: 3
Reputation: 26667
You cannot use \b
to limit the regex because .
is included within the \b
. From the input string we can notice that the ips are delimited using space hence \s
is a much better option.
Changing the regex with a lookarounds for \s
would serve the pupose
>>> attern = r"(?<=\s)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?=\s)"
>>> re.findall(attern, input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']
Upvotes: 1