orak
orak

Reputation: 2419

Python regex, capture groups that are not specific

Consider the following example strings:

abc1235abc53abcXX

123abc098YXabc

I want to capture the groups that occur between the abc,

e.g. I should get the following groups:

1235, 53, XX
123, 098YX

I'm trying this regex, but somehow it does not capture the in-between text:

(abc(.*?))+

What am I doing wrong?

EDIT: I need to do it using regex, no string splitting, since I need to apply further rules on the captured groups.

Upvotes: 1

Views: 55

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

re.findall() approach with specific regex pattern:

import re

strings = ['abc1235abc53abcXX', '123abc098YXabc']
pat = re.compile(r'(?:abc|^)(.+?)(?=abc|$)')    # prepared pattern

for s in strings:
    items = pat.findall(s)
    print(items)
    # further processing

The output:

['1235', '53', 'XX']
['123', '098YX']

  • (?:abc|^) - non-captured group to match either abc substring OR start of the string ^
  • (.+?) - captured group to match any character sequence as few times as possible
  • (?=abc|$) - lookahead positive assertion, ensures that the previous matched item is followed by either abc sequence OR end of the string $

Upvotes: 5

Swadhikar
Swadhikar

Reputation: 2200

Try splitting the string by abc and then remove the empty results by using if statement inside list comprehension as below:

[r for r in re.split('abc', s) if r]

Upvotes: 0

Thierry Lathuille
Thierry Lathuille

Reputation: 24234

Use re.split:

import re

s = 'abc1235abc53abcXX'

re.split('abc', s)
# ['', '1235', '53', 'XX']

Note that you get an empty string, representing the match before the first 'abc'.

Upvotes: 3

Related Questions