vp_arth
vp_arth

Reputation: 14982

Python regexp: get all group's sequence

I have a regex like this '^(a|ab|1|2)+$' and want to get all sequence for this...

for example for re.search(reg, 'ab1') I want to get ('ab','1')

Equivalent result I can get with '^(a|ab|1|2)(a|ab|1|2)$' pattern, but I don't know how many blocks been matched with (pattern)+

Is this possible, and if yes - how?

Upvotes: 5

Views: 1403

Answers (3)

wedem
wedem

Reputation: 21

I think you don't need regexpes for this problem, you need some recursial graph search function

Upvotes: 2

ebenpack
ebenpack

Reputation: 458

Your original expression does match the way you want to, it just matches the entire string and doesn't capture individual groups for each separate match. Using a repetition operator ('+', '*', '{m,n}'), the group gets overwritten each time, and only the final match is saved. This is alluded to in the documentation:

If a group matches multiple times, only the last match is accessible.

Upvotes: 3

nio
nio

Reputation: 5289

try this:

import re
r = re.compile('(ab|a|1|2)')
for i in r.findall('ab1'):
    print i

The ab option has been moved to be first, so it will match ab in favor of just a. findall method matches your regular expression more times and returns a list of matched groups. In this simple example you'll get back just a list of strings. Each string for one match. If you had more groups you'll get back a list of tuples each containing strings for each group.

This should work for your second example:

pattern = '(7325189|7325|9087|087|18)'
str = '7325189087'
res = re.compile(pattern).findall(str)
print(pattern, str, res, [i for i in res])

I'm removing the ^$ signs from the pattern because if findall has to find more than one substring, then it should search anywhere in str. Then I've removed + so it matches single occurences of those options in pattern.

Upvotes: 4

Related Questions