Rukshan Dangalla
Rukshan Dangalla

Reputation: 2590

Regex pattern to match letter combination of a word

Currently I am developing puzzle game for kids where player needs to select correct word from the grid. I used regex to match the word.

For an example I used ([D|E|C|K]){4} to match DECK because player should be able to select the word not in exact D->E->C->K order. Player may select it KDEC or EDCK or KCED or any order.

I achieved this by using ([D|E|C|K]){4}.

But here I am facing issue, this pattern matches EEEE or DDDD or DKDK and etc. Simply any combination of 4 chars from the set.

Any Idea how can I modify the regex to get my desired outcome?

Thanks in advance.

enter image description here

Upvotes: 1

Views: 2000

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Basically, this is not a good job for a regex because this is not regular language. You'd better follow a simple algorithm to split the input string into characters, sort them, and rejoin into a string, do the same with the search string, then compare the results.

See a JavaScript demo with the word TALL:

const strings = ['TALL','LATL','TLAL','TTAL','AATT','ATL','STL'];
const search = 'TALL';
const compare_with = search.split("").sort().join("");
for (let s of strings) {
    console.log(s, ':', s.split("").sort().join("") == compare_with );
}

Can we do it with a regex? In .NET, you may use balancing construct, and it is a solution, not a workaround.

Scenario 1: .NET regex engine specific solution

Assuming your search word is TALL, you may build a regex like

^(?:(T)|(A)|(L)|(L)){4}$(?<-1>)(?<-2>)(?<-3>)(?<-4>)

See the regex demo.

Details

  • ^- start of string
  • (?:(T)|(A)|(L)|(L)){4} - a non-capturing group that matches 4 occurrences of
    • (T) - T pushed on to the Group 1 capture stack
    • |(A) - or A pushed on to the Group 2 capture stack
    • |(L) - or L pushed on to the Group 3 capture stack
    • |(L) - or L pushed on to the Group 4 capture stack
  • $ - end of string
  • (?<-1>)(?<-2>)(?<-3>)(?<-4>) - Pop a value from each of the capturing groups. If any group capture stack is not empty, return false and result in no match, else, there is a match.

Scenario 2: Lookahead basd work-around in case all characters are unique

You may match and capture each letter from the range into a separate capturing group and add a negative lookahead before each subsequent capturing group to avoid matching a letter matched before it.

The regex will look like

^([DECK])(?!\1)([DECK])(?!\1|\2)([DECK])(?!\1|\2|\3)([DECK])$

See the regex demo

Details

  • ^ - start of string
  • ([DECK]) - Group 1: a letter, D, E, C or K
  • (?!\1) - the next char cannot be the one captured into Group 1
  • ([DECK]) - Group 2: a letter, D, E, C or K
  • (?!\1|\2)([DECK]) - the next letter cannot be equal to the first and second one
  • (?!\1|\2|\3)([DECK]) - the next letter cannot be equal to the first, second and third one
  • $ - end of string

Upvotes: 2

Related Questions