user3776749
user3776749

Reputation: 697

Finding a simpler Python RegEx for a string that contains each character at least once

I'm working on a small project and have need for Regular Expressions that accept strings that contain each character in a given alphabet at least once.

So for the alphabet {J, K, L} I would need a RegEx that accepts strings containing J one or more times AND K one or more times, AND L one or more times, in any order, with any amount of duplicate characters before, after, or in-between.

I'm pretty inexperienced with RegEx and so have trouble finding "lateral thinking" solutions to many problems. My first approach to this was therefore pretty brute-force: I took each possible "base" string, for example,

JKL, JLK, KJL, KLJ, LKJ, LJK

and allow for any string that could be built up from one of those starting points. However the resulting regular expression* (despite working) ends up being very long and containing a lot of redundancy. Not to mention this approach becomes completely untenable once the alphabet has more than a handful of characters.

I spent a few hours trying to find a more elegant approach, but I have yet to find one that still accepts every possible string. Is there a method or technique I could be using to get this done in a way that's more elegant and scalable (to larger alphabets)?

*For reference, my regular expression for the listed example:

((J|K|L)*J(J|K|L)*K(J|K|L)*L(J|K|L)*)|
((J|K|L)*J(J|K|L)*L(J|K|L)*K(J|K|L)*)|
((J|K|L)*K(J|K|L)*J(J|K|L)*L(J|K|L)*)|
((J|K|L)*K(J|K|L)*L(J|K|L)*J(J|K|L)*)|
((J|K|L)*L(J|K|L)*J(J|K|L)*K(J|K|L)*)|
((J|K|L)*L(J|K|L)*K(J|K|L)*J(J|K|L)*)

Upvotes: 1

Views: 101

Answers (2)

Till Hoffmann
Till Hoffmann

Reputation: 9877

If using regex is not a requirement you could also check for the characters individually:

text = ...
alphabet = 'JKL'
assert all([character in text for character in alphabet])

Or if you do not want to allow characters that are not in the alphabet:

assert set(alphabet) == set(text)

Upvotes: 3

Sebastian Proske
Sebastian Proske

Reputation: 8413

This is a typical use-case for a lookahead. You can simply use ^(?=[^J]*J)(?=[^K]*K)(?=[^L]*L) to check all your conditions. If your string also must contain only these characters, you can append [JKL]+$ to it.

Upvotes: 6

Related Questions