Reputation: 697
I'm working on a small project and have need for Regular Expressions that accept strings that contain each character in a given alphabet at least once.
So for the alphabet {J, K, L}
I would need a RegEx that accepts strings containing J
one or more times AND K
one or more times, AND L
one or more times, in any order, with any amount of duplicate characters before, after, or in-between.
I'm pretty inexperienced with RegEx and so have trouble finding "lateral thinking" solutions to many problems. My first approach to this was therefore pretty brute-force: I took each possible "base" string, for example,
JKL, JLK, KJL, KLJ, LKJ, LJK
and allow for any string that could be built up from one of those starting points. However the resulting regular expression* (despite working) ends up being very long and containing a lot of redundancy. Not to mention this approach becomes completely untenable once the alphabet has more than a handful of characters.
I spent a few hours trying to find a more elegant approach, but I have yet to find one that still accepts every possible string. Is there a method or technique I could be using to get this done in a way that's more elegant and scalable (to larger alphabets)?
*For reference, my regular expression for the listed example:
((J|K|L)*J(J|K|L)*K(J|K|L)*L(J|K|L)*)|
((J|K|L)*J(J|K|L)*L(J|K|L)*K(J|K|L)*)|
((J|K|L)*K(J|K|L)*J(J|K|L)*L(J|K|L)*)|
((J|K|L)*K(J|K|L)*L(J|K|L)*J(J|K|L)*)|
((J|K|L)*L(J|K|L)*J(J|K|L)*K(J|K|L)*)|
((J|K|L)*L(J|K|L)*K(J|K|L)*J(J|K|L)*)
Upvotes: 1
Views: 101
Reputation: 9877
If using regex is not a requirement you could also check for the characters individually:
text = ...
alphabet = 'JKL'
assert all([character in text for character in alphabet])
Or if you do not want to allow characters that are not in the alphabet:
assert set(alphabet) == set(text)
Upvotes: 3
Reputation: 8413
This is a typical use-case for a lookahead. You can simply use ^(?=[^J]*J)(?=[^K]*K)(?=[^L]*L)
to check all your conditions. If your string also must contain only these characters, you can append [JKL]+$
to it.
Upvotes: 6