Bastien Déchamps
Bastien Déchamps

Reputation: 51

Capture multiple named groups in any order with regex

I have some regex in named groups such as (P?<a>A), (P?<b>B), (P?<c>C). Then I have a sentence like some_word A C B with random order for A, B and C. I need to match those groups only if some_word appear in front of them. If this is the case, I would like to have an output like this : {a : "A", b : "B", c : "C"}.

I tried with the regex some_word ((?P<a>A)\s|(?P<b>B)\s|(?P<c>C)\s){3}, but it does not work, as the group names have to be unique.

The only solution I have found is by using the regex some_word (?P<a>A|B|C)\s(?P<b>A|B|C)\s(?P<c>A|B|C). It handles the permutation between A, B and C, but I lose the link {a : "A", b : "B", c : "C"}.

Thank you for your help !

Upvotes: 0

Views: 1139

Answers (3)

sln
sln

Reputation: 2706

If you are looking to match from some_word up until the last A,B, or C
in random order something like this works.
This will match the minimum string after some_word up until the first set
that includes A, B or C at least once.

some_word(?:(?=(?P<a>A)()|(?P<b>B)()|(?P<c>C)()|.).)+?(?=\2\4\6)

https://regex101.com/r/Gu5TnB/1

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You can use the second approach but restrict each group pattern with a negative lookahead to avoid matching repeated contents:

import re
text = 'some_word B C A'
for x in re.finditer(r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C))', text):
    print( x.group("a") )
    print( x.group("b") )
    print( x.group("c") )

See the Python demo, output:

B
C
A

See the regex demo. The (?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C)) part matches A or B or C into Group "a", (?P<b>A|B|C) matches the same and captures into Group "b", but this value cannot start the same as the value in Group "a".

To make sure the values are not equal, you can add the whitespace boundaries to the lookaheads:

r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a)(?!\S))(?P<b>A|B|C)\s+(?!(?:(?P=a)|(?P=b))(?!\S))(?P<c>A|B|C))'

Upvotes: 1

Alireza
Alireza

Reputation: 2123

You can use this pattern: (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*

See Regex Demo

Code:

import re

pattern = "(?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*"
text = "some_word A C B"
matches = re.search(pattern, text)
print(matches.groupdict())                         

Output:

{'a': 'A', 'b': 'B', 'c': 'C'}

Upvotes: 1

Related Questions