Reputation: 151
My goals is to locate IP address inside a text.
Using grep, I was able to do it with the regular expression ([0-9]+\.){3}[0-9]+
.
With re
from Python, I don't understand why it doesn't work unless I precede the expression inside the parentheses with ?:
I understand that the use of ?:
will prevent the creation of a group, but I can't explain the result when this prefix is deleted.
>>> s
'64 bytes from 10.11.1.5: icmp_seq=2 ttl=128 time=215 ms'
>>> p=re.compile(r"(?:[0-9]+\.){3}")
>>> p.findall(s)
['10.11.1.']
>>> p=re.compile(r"([0-9]+\.){3}")
>>> p.findall(s)
['1.']
Upvotes: 4
Views: 1744
Reputation: 2475
You could use the following, link to repl:
import re
s = '64 bytes from 10.11.1.5: icmp_seq=2 ttl=128 time=215 ms'
r = re.compile(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")
ip = r.findall(s)
print(ip)
Upvotes: 0
Reputation: 388233
As per the documentation (emphasis mine):
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of
pattern
instring
, as a list of strings. Thestring
is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.
Your first example uses a non-capturing group, as such, there is “no group” inside of the pattern for findall
to return. So instead, it will return a list of all results where the full pattern matched. In your case this means the full IP.
In the second example, there is a capturing group, so the highlighted part of explanation applies: Instead of returning a list of all full matches, you only get a list of groups.
But there is only a single group inside of your pattern. That group is being captured multiple times but every group can only capture a single value; that is a limitation of regular expressions. So for your example, only the last captured value is being available in the findall
result.
If you want to capture repeated groups, you will have to actively capture those in a separate group, e.g. using ((\d+\.){3})
. That will give you two groups. The first will capture 10.1.1.
and the second the last part 1.
.
Upvotes: 1
Reputation: 371108
See docs for re.findall:
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.
Emphasis mine. There are no capturing groups in your first pattern, so it returns the one full match in the input provided, as a string:
['10.11.1.']
But with ([0-9]+\.){3}
, you do have a capturing group, so rather than returning the full match as a string, it returns a list of groups. Remember that
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
which is why only the last repitition of the group is seen in the result, as ['1.']
. (The full match is not included, only the captured groups are)
Upvotes: 3