Reputation: 65
I need some help to write a Regex for character matching. The scenario is that I have a text file with about 300 000 lines, with one word on each line. I need to find the words that match a certain set of characters.
Think of Scrabble as a very similar example, where a user has a set of characters, say for example P E S plus a wildcard character that can match any character (but only once).
If the text file contains the following words:
...only the words in bold should be matched, as each of the user's characters, including the wildcard, can only be used maximum once in matching.
Is there a way to write a regex expression for this?
I have started with...:
\b[P,E,S]\b
...but don't know how I should express that:
Thank you in advance! Please let me know if I need to clarify the problem.
// Peter
Upvotes: 1
Views: 700
Reputation: 26940
Impossible is nothing :
You can do this with regexes using lookahaeds :
(?=^.+$)(?=^[^P]*?P?[^P]*?$)(?=^[^E]*?E?[^E]*?$)(?=^[^S]*?S?[^S]*?$)
Basically if you break it down there are five components :
First lookahead :
(?=^.+$)
Checks if length is >= 1
Then the three parts :
(?=^[^P]*?P?[^P]*?$)
for E and S respectively check if a maximum of 1 of these characters exist.
The above simply tells to check the whole string for a single occurrence of P. If more than one P is found the regex fails. Same is applied to the following two lookaheads.
For the wildcard I have to think a smart way to do it :)..
Upvotes: 1
Reputation: 93090
This is not very easy with regex (if at all possible). Much simpler would be something like this:
List<char> set = new List<char>("PES");
string s = "PIES";
bool matches = s.Count(ch => !set.Remove(ch)) < 2;
Upvotes: 1