Reputation: 51
I have some regex in named groups such as (P?<a>A)
, (P?<b>B)
, (P?<c>C)
. Then I have a sentence like some_word A C B
with random order for A
, B
and C
. I need to match those groups only if some_word
appear in front of them. If this is the case, I would like to have an output like this : {a : "A", b : "B", c : "C"}
.
I tried with the regex some_word ((?P<a>A)\s|(?P<b>B)\s|(?P<c>C)\s){3}
, but it does not work, as the group names have to be unique.
The only solution I have found is by using the regex some_word (?P<a>A|B|C)\s(?P<b>A|B|C)\s(?P<c>A|B|C)
. It handles the permutation between A
, B
and C
, but I lose the link {a : "A", b : "B", c : "C"}
.
Thank you for your help !
Upvotes: 0
Views: 1139
Reputation: 2706
If you are looking to match from some_word up until the last A,B, or C
in random order something like this works.
This will match the minimum string after some_word up until the first set
that includes A, B or C at least once.
some_word(?:(?=(?P<a>A)()|(?P<b>B)()|(?P<c>C)()|.).)+?(?=\2\4\6)
https://regex101.com/r/Gu5TnB/1
Upvotes: 0
Reputation: 626691
You can use the second approach but restrict each group pattern with a negative lookahead to avoid matching repeated contents:
import re
text = 'some_word B C A'
for x in re.finditer(r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C))', text):
print( x.group("a") )
print( x.group("b") )
print( x.group("c") )
See the Python demo, output:
B
C
A
See the regex demo. The (?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C))
part matches A
or B
or C
into Group "a", (?P<b>A|B|C)
matches the same and captures into Group "b", but this value cannot start the same as the value in Group "a".
To make sure the values are not equal, you can add the whitespace boundaries to the lookaheads:
r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a)(?!\S))(?P<b>A|B|C)\s+(?!(?:(?P=a)|(?P=b))(?!\S))(?P<c>A|B|C))'
Upvotes: 1
Reputation: 2123
You can use this pattern: (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*
See Regex Demo
Code:
import re
pattern = "(?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*"
text = "some_word A C B"
matches = re.search(pattern, text)
print(matches.groupdict())
Output:
{'a': 'A', 'b': 'B', 'c': 'C'}
Upvotes: 1