Reputation: 386
In the following word list
ABCD
AAAA
AAAD
AAAB
BBDA
CCCC
CCCA
DADA
BABC
...
all possible 256 combinations
Using regrex, I want to select words which has my pattern, A or B in any combination covering exact 3 position out of 4.
Expected output:
AAAD
BBDA
BABC
I know using [AB]{4} I can match whole world but conditional search with exact 3 position out of 4 creating confusion.
Upvotes: 3
Views: 481
Reputation: 18357
Just rephrasing and confirming the rules of matching you said so my solution is adhering to it,
A
and B
C
or D
If this is correct, you can use this regex to match the strings you want.
^(?=[AB]*[CD][AB]*$).{4}$
Explanation of above regex:
^
- Start of string(?=[AB]*[CD][AB]*$)
- Positive look ahead to ensure either C
or D
appears only once in the string so other three positions are occupied by A
s and B
s.{4}$
- Capture four letters A
to D
using dot as they are already validated to be A
to D
by positive look ahead.Here is a regulex graph for better visualization
Edit:
Detailed explanation of (?=[AB]*[CD][AB]*$)
A positive look ahead is written as (?=some regex)
and contrary to normal regex matching and consumption, a lookaround (positive/negative lookahead/lookbehind) just matches the characters and does not consume them, meaning as soon as the lookaround expression is over, the regex marker is reset back to where it was before the lookround began matching. In this regex we have [AB]*[CD][AB]*$
as expression inside it, where [AB]*
means it will match any character in the set (A
or B
) zero or more times followed by [CD]
which means it needs to match exactly one character (as there is no quantifier here) from char set which is either C
or D
and further again [AB]*
matches any character A
or B
zero or more times and finally ensures end of string is reached as it has $
.
In summary, logical meaning of this expression is, that there will be exactly one occurrence of either C
or D
while it can be surrounded by A
s or B
on either side as needed to form four alphabets matching all combinations of four letters having only one occurrence of C
or D
.
Also, incorporated suggestion by revo where [A-D]
can be changed to just .
Many thanks to Revo.
Upvotes: 4
Reputation: 48751
Try the following regex:
^([^AB\r\n]*[AB]){3}(?!(?1)).*$
See live demo here
If recursions ((?1)
cluster) aren't supported in the engine you are working with go with this instead:
^(?:[^AB\r\n]*[AB]){3}(?![^AB\r\n]*[AB]).*$
See live demo here
Upvotes: 2
Reputation: 10054
This will do:
^([^AB][AB]{3}|[AB][^AB][AB]{2}|[AB]{2}[^AB][AB]|[AB]{3}[^AB])$
Upvotes: 2
Reputation: 273380
If I understood correctly, you want to match all the strings that have exactly three characters that are either A
or B
. This implies that the string will have exactly one character that is not A
or B
.
You can do this by replacing all the As and Bs in the string, and checking if the remaining string is only one character:
for string in all_your_strings:
if len(re.sub(r"[AB]", "", string)) == 1:
# match!
else:
# not match
Upvotes: 2