Peter S
Peter S

Reputation: 65

Regex matching with wildcard and where each character in expression can only be used once

I need some help to write a Regex for character matching. The scenario is that I have a text file with about 300 000 lines, with one word on each line. I need to find the words that match a certain set of characters.

Think of Scrabble as a very similar example, where a user has a set of characters, say for example P E S plus a wildcard character that can match any character (but only once).

If the text file contains the following words:

...only the words in bold should be matched, as each of the user's characters, including the wildcard, can only be used maximum once in matching.

Is there a way to write a regex expression for this?

I have started with...:

\b[P,E,S]\b

...but don't know how I should express that:

  1. Each character (P, E, S) can only be used once
  2. Any character (the wildcard) can also be used once

Thank you in advance! Please let me know if I need to clarify the problem.

// Peter

Upvotes: 1

Views: 700

Answers (2)

FailedDev
FailedDev

Reputation: 26940

Impossible is nothing :

You can do this with regexes using lookahaeds :

(?=^.+$)(?=^[^P]*?P?[^P]*?$)(?=^[^E]*?E?[^E]*?$)(?=^[^S]*?S?[^S]*?$)

Basically if you break it down there are five components :

First lookahead :

(?=^.+$)

Checks if length is >= 1

Then the three parts :

(?=^[^P]*?P?[^P]*?$)

for E and S respectively check if a maximum of 1 of these characters exist.

The above simply tells to check the whole string for a single occurrence of P. If more than one P is found the regex fails. Same is applied to the following two lookaheads.

For the wildcard I have to think a smart way to do it :)..

Upvotes: 1

Petar Ivanov
Petar Ivanov

Reputation: 93090

This is not very easy with regex (if at all possible). Much simpler would be something like this:

List<char> set = new List<char>("PES");

string s = "PIES";

bool matches = s.Count(ch => !set.Remove(ch)) < 2;

Upvotes: 1

Related Questions