Reputation: 2419
Consider the following example strings:
abc1235abc53abcXX
123abc098YXabc
I want to capture the groups that occur between the abc,
e.g. I should get the following groups:
1235, 53, XX
123, 098YX
I'm trying this regex, but somehow it does not capture the in-between text:
(abc(.*?))+
What am I doing wrong?
EDIT: I need to do it using regex, no string splitting, since I need to apply further rules on the captured groups.
Upvotes: 1
Views: 55
Reputation: 92854
re.findall()
approach with specific regex pattern:
import re
strings = ['abc1235abc53abcXX', '123abc098YXabc']
pat = re.compile(r'(?:abc|^)(.+?)(?=abc|$)') # prepared pattern
for s in strings:
items = pat.findall(s)
print(items)
# further processing
The output:
['1235', '53', 'XX']
['123', '098YX']
(?:abc|^)
- non-captured group to match either abc
substring OR start of the string ^
(.+?)
- captured group to match any character sequence as few times as possible(?=abc|$)
- lookahead positive assertion, ensures that the previous matched item is followed by either abc
sequence OR end of the string $
Upvotes: 5
Reputation: 2200
Try splitting the string by abc
and then remove the empty results by using if
statement inside list
comprehension as below:
[r for r in re.split('abc', s) if r]
Upvotes: 0
Reputation: 24234
Use re.split:
import re
s = 'abc1235abc53abcXX'
re.split('abc', s)
# ['', '1235', '53', 'XX']
Note that you get an empty string, representing the match before the first 'abc'.
Upvotes: 3