kashiff007
kashiff007

Reputation: 386

Match at exact 3 character in a word length of 4 with all possible combinations

In the following word list

ABCD
AAAA
AAAD
AAAB
BBDA
CCCC
CCCA
DADA
BABC

...
all possible 256 combinations

Using regrex, I want to select words which has my pattern, A or B in any combination covering exact 3 position out of 4.

Expected output:

AAAD
BBDA
BABC

I know using [AB]{4} I can match whole world but conditional search with exact 3 position out of 4 creating confusion.

Upvotes: 3

Views: 481

Answers (4)

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

Just rephrasing and confirming the rules of matching you said so my solution is adhering to it,

  • three positions out of four should be occupied by either A and B
  • Only one position goes reserved for C or D

If this is correct, you can use this regex to match the strings you want.

^(?=[AB]*[CD][AB]*$).{4}$

Explanation of above regex:

  • ^ - Start of string
  • (?=[AB]*[CD][AB]*$) - Positive look ahead to ensure either C or D appears only once in the string so other three positions are occupied by As and Bs
  • .{4}$ - Capture four letters A to D using dot as they are already validated to be A to D by positive look ahead.

Regex Demo

Here is a regulex graph for better visualization

enter image description here

Edit: Detailed explanation of (?=[AB]*[CD][AB]*$)

A positive look ahead is written as (?=some regex) and contrary to normal regex matching and consumption, a lookaround (positive/negative lookahead/lookbehind) just matches the characters and does not consume them, meaning as soon as the lookaround expression is over, the regex marker is reset back to where it was before the lookround began matching. In this regex we have [AB]*[CD][AB]*$ as expression inside it, where [AB]* means it will match any character in the set (A or B) zero or more times followed by [CD] which means it needs to match exactly one character (as there is no quantifier here) from char set which is either C or D and further again [AB]* matches any character A or B zero or more times and finally ensures end of string is reached as it has $.

In summary, logical meaning of this expression is, that there will be exactly one occurrence of either C or D while it can be surrounded by As or B on either side as needed to form four alphabets matching all combinations of four letters having only one occurrence of C or D.

Also, incorporated suggestion by revo where [A-D] can be changed to just . Many thanks to Revo.

Upvotes: 4

revo
revo

Reputation: 48751

Try the following regex:

^([^AB\r\n]*[AB]){3}(?!(?1)).*$

See live demo here

If recursions ((?1) cluster) aren't supported in the engine you are working with go with this instead:

^(?:[^AB\r\n]*[AB]){3}(?![^AB\r\n]*[AB]).*$

See live demo here

Upvotes: 2

fphilipe
fphilipe

Reputation: 10054

This will do:

^([^AB][AB]{3}|[AB][^AB][AB]{2}|[AB]{2}[^AB][AB]|[AB]{3}[^AB])$

Upvotes: 2

Sweeper
Sweeper

Reputation: 273380

If I understood correctly, you want to match all the strings that have exactly three characters that are either A or B. This implies that the string will have exactly one character that is not A or B.

You can do this by replacing all the As and Bs in the string, and checking if the remaining string is only one character:

for string in all_your_strings:
    if len(re.sub(r"[AB]", "", string)) == 1:
        # match!
    else:
        # not match

Upvotes: 2

Related Questions