Reputation: 224
I tryed the folowing:
I want to split with the re.findall()
str="<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>"
print(re.findall('<(abc|ghj)>.*?<*>',str))
The out should be
['<abc>somechars<*>','<ghj>somechars<*>']
In notepad, if I try this expression I get right, but here:
['abc', 'ghj']
Any idea? Thanks for the answers.
Upvotes: 1
Views: 126
Reputation: 10350
You're capturing (abc|ghj)
. Use a non-capturing group (?:abc|ghj)
instead.
Also, you should escape the second *
in your regex since you want a literal asterisk: <\*>
rather than <*>
.
>>> s = '<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>'
>>> re.findall(r'<(?:abc|ghj)>.*?<\*>', s)
['<abc>somechars<*>', '<ghj>somechars<*>']
Also also, avoid shadowing the built-in name str
.
Upvotes: 1
Reputation: 1482
Just make the group a non-capturing group:
str="<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>"
print(re.findall('<(?:abc|ghj)>.*?<*>',str))
The function returns the groups from left to right, and since you specified a group it left out the entire match.
From the Python documentation
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match
.
Upvotes: 0
Reputation: 67968
(<(?:abc|ghj)>.*?<\*>)
Try this.See demo.
http://regex101.com/r/kP8uF5/12
import re
p = re.compile(ur'(<(?:abc|ghj)>.*?<\*>)', re.IGNORECASE | re.MULTILINE)
test_str = u"<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>"
re.findall(p, test_str)
Upvotes: 3