Reputation: 10709
This one is a little complicated and somewhat out of my league. I want to sort through a list of words and eliminate those that don't contain a specific set of characters, however those characters can be in any order and some may occur more than others.
I want the regex to look for any words with:
e
0 or 1 times
a
0 or 1 times
t
0 or 1 or 2 times
For example the following would work:
eat
tea
tate
tt
a
e
The following would not work
eats
teas
tates
ttt
aa
ee
Lookaround Regex is new to me, so I'm not 100% sure on the syntax (any answer using a lookaround with an explanation would be awesome). My best guess so far:
Regex regex = new Regex(@"(?=.*e)(?=.*a)(?=.*t)");
lines = lines.Where(x => regex.IsMatch(x)).ToArray(); //'text' is array containing words
Upvotes: 4
Views: 318
Reputation:
This is probably the same as the others, I haven't formatted those to find out.
Note that assertions are coerced to match, they can't be optional
(unless specifically set optional, but what for?) and are not directly affected by backtracking.
This works, explanation is in the formatted regex.
updated
To use a whitespace boundary, use this:
(?<!\S)(?!\w*(?:e\w*){2})(?!\w*(?:a\w*){2})(?!\w*(?:t\w*){3})[eat]+(?!\S)
Formatted:
(?<! \S )
(?!
\w*
(?: e \w* ){2}
)
(?!
\w*
(?: a \w* ){2}
)
(?!
\w*
(?: t \w* ){3}
)
[eat]+
(?! \S )
To use an ordinary word boundary, use this:
\b(?!\w*(?:e\w*){2})(?!\w*(?:a\w*){2})(?!\w*(?:t\w*){3})[eat]+\b
Formatted:
\b # Word boundary
(?! # Lookahead, assert Not 2 'e' s
\w*
(?: e \w* ){2}
)
(?! # Lookahead, assert Not 2 'a' s
\w*
(?: a \w* ){2}
)
(?! # Lookahead, assert Not 3 't' s
\w*
(?: t \w* ){3}
)
# At this point all the checks pass,
# all thats left is to match the letters.
# -------------------------------------------------
[eat]+ # 1 or more of these, Consume letters 'e' 'a' or 't'
\b # Word boundary
Upvotes: 1
Reputation: 336468
Sure:
\b(?:e(?!\w*e)|t(?!(?:\w*t){2})|a(?!\w*a))+\b
Explanation:
\b # Start of word
(?: # Start of group: Either match...
e # an "e",
(?!\w*e) # unless another e follows within the same word,
| # or
t # a "t",
(?! # unless...
(?:\w*t){2} # two more t's follow within the same word,
) #
| # or
a # an "a"
(?!\w*a) # unless another a follows within the same word.
)+ # Repeat as needed (at least one letter)
\b # until we reach the end of the word.
Test it live on regex101.com.
(I've used the \w
character class for simplicity's sake; if you want to define your allowed "word characters" differently, replace this accordingly)
Upvotes: 3