Reputation: 1369
When the parenthesis were used in the below program output is
['www.google.com']
.
import re
teststring = "href=\"www.google.com\""
m=re.findall('href="(.*?)"',teststring)
print m;
If parenthesis is removed in findall function output is ['href="www.google.com"']
.
import re
teststring = "href=\"www.google.com\""
m=re.findall('href=".*?"',teststring)
print m;
Would be helpful if someone explained how it works.
Upvotes: 2
Views: 80
Reputation: 1121346
The re.findall()
documentation is quite clear on the difference:
Return all non-overlapping matches of pattern in string, as a list of strings. […] If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
So .findall()
returns a list containing one of three types of values, depending on the number of groups in the pattern:
(...)
parenthesis): the whole matched string ('href="www.google.com"'
in your second example).'www.google.com'
in your first example).Use non-capturing groups ((?:...)
) if you don't want that behaviour, or add groups if you want more information. For example, adding a group around the href=
part would result in a list of tuples with two elements each:
>>> re.findall('(href=)"(.*?)"', teststring)
[('href=', 'www.google.com')]
Upvotes: 5