jeff
jeff

Reputation: 151

non-capturing version of regular parentheses Python

My goals is to locate IP address inside a text.

Using grep, I was able to do it with the regular expression ([0-9]+\.){3}[0-9]+.

With re from Python, I don't understand why it doesn't work unless I precede the expression inside the parentheses with ?:

I understand that the use of ?: will prevent the creation of a group, but I can't explain the result when this prefix is deleted.

>>> s
'64 bytes from 10.11.1.5: icmp_seq=2 ttl=128 time=215 ms'
>>> p=re.compile(r"(?:[0-9]+\.){3}")
>>> p.findall(s)
['10.11.1.']
>>> p=re.compile(r"([0-9]+\.){3}")
>>> p.findall(s)
['1.']

Upvotes: 4

Views: 1744

Answers (3)

addohm
addohm

Reputation: 2475

You could use the following, link to repl:

import re
s = '64 bytes from 10.11.1.5: icmp_seq=2 ttl=128 time=215 ms'
r = re.compile(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")
ip = r.findall(s)
print(ip)

Upvotes: 0

poke
poke

Reputation: 388233

As per the documentation (emphasis mine):

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Your first example uses a non-capturing group, as such, there is “no group” inside of the pattern for findall to return. So instead, it will return a list of all results where the full pattern matched. In your case this means the full IP.

In the second example, there is a capturing group, so the highlighted part of explanation applies: Instead of returning a list of all full matches, you only get a list of groups.

But there is only a single group inside of your pattern. That group is being captured multiple times but every group can only capture a single value; that is a limitation of regular expressions. So for your example, only the last captured value is being available in the findall result.

If you want to capture repeated groups, you will have to actively capture those in a separate group, e.g. using ((\d+\.){3}). That will give you two groups. The first will capture 10.1.1. and the second the last part 1..

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 371108

See docs for re.findall:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Emphasis mine. There are no capturing groups in your first pattern, so it returns the one full match in the input provided, as a string:

['10.11.1.']

But with ([0-9]+\.){3}, you do have a capturing group, so rather than returning the full match as a string, it returns a list of groups. Remember that

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data

which is why only the last repitition of the group is seen in the result, as ['1.']. (The full match is not included, only the captured groups are)

Upvotes: 3

Related Questions