FailSafe
FailSafe

Reputation: 482

Python Regex return both results when 2 conditions set which partially satisfy one another WITHOUT IF statements nor Test groups and NOT AS A TUPLE

I'm going to have quite a few questions about regex in the coming days. Out of 10 challenges I gave myself over the past 5 days, I managed to solve 6.

I'm hoping the following isn't simple and embarrassing, but what I'm trying to do use re.findall to return results for both conditions even though the condition for set 2 may have already partially been satisfied by set 1.

Example (Problem):

>>> str = 'ab1cd2efg1hij2k'
>>> re.findall('ab1cd|ab', str)
['ab1cd']
>>> re.findall('ab|ab1cd', str)
['ab']

So notice that depending on whichever comes first in the OR statement determines what the single element of the array is. What I want is to be able to return both for a 2 element array and preferably not a Tuple. The readings I've done on regex ANDing have focused on making regexes match 2 different strings as opposed to returning multiple results that may mutually satisfy one another partially. Below is what I desire.

Desired Output:

>>> str = 'ab1cd2efg1hij2k'
>>> re.findall('{SOMETHING_THAT_RETURNS_BOTH}', str)
['ab', 'ab1cd']

The closest I've gotten is the following:

>>> re.findall('ab|[\S]+?(?=2e)', str)
['ab', '1cd']
>>> re.findall('ab|[\S]+(?=2e)', str)
['ab', '1cd']

but the second capture group ignores ab. Is there a directive in regex to say restart from the beginning? (?:^) seems to work the same as a ^ and using it in several ways didn't help thus far. Please note I DO NOT want to use regex IF statements nor test to see if a previous group matched just yet because I'm not quite ready to learn those methods before forming a more solid foundation for the things I don't yet know.

Thanks so much.

Upvotes: 1

Views: 1050

Answers (2)

l'L'l
l'L'l

Reputation: 47264

Looking at the desired output the regex pattern shouldn't really require any lookaheads:

str = 'ab1cd2efg1hij2k1cd'
res = re.findall(r'((ab)?1cd)', str)
[list(row) for row in res][0]

The ? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy).

Result:

['ab1cd', 'ab']

Upvotes: 1

anubhava
anubhava

Reputation: 785611

If you can relax tuple requirement then following regex with 2 independent lookaheads is needed due to your requirement of capturing overlapping text:

>>> print re.search(r'(?=(ab1cd))(?=(ab))', str).groups()
('ab1cd', 'ab')

Both lookaheads have a capturing group thus giving us required output.

You can also use findall:

>>> print re.findall(r'(?=(ab1cd))(?=(ab))', str)[0]
('ab1cd', 'ab')

Upvotes: 1

Related Questions