K. Obrecht
K. Obrecht

Reputation: 51

Writing a regular expression to match 1 or 2 letters but not all 3

I am trying to write a regular expression to match strings that contain x, y or z, but only 1-2 of them can be in it.

For example: valid strings = xxxx, xxxyyyy, xyxyx, zyzzzyyy, xzzzxx.

invalid strings = xyz, xxxyyyyz, zxzyy

I was initially writing it as follows

regex = re.compile("((x*y*)*)|((x*z*)*)|(y*z*)*)")

My logic here was that it would first test for strings with xy then xz then yz. But this is not working unfortunately. It works for my first test string of xyxyxyxyx but for my second string, zyzyzyzy it doesn't match it. Am I using the vertical "or" lines in a wrong way?

Upvotes: 1

Views: 1310

Answers (4)

ikegami
ikegami

Reputation: 385897

If the string can only contain characters x, y and z:

^([xy]*|[xz]*|[yz]*)$

If the string can contain characters other than x, y and z:

^(?:[^x]+|[^y]+|[^z]+)?$

Partially optimized:

^[^xyz]*(?:[^x]+|[^y]+|[^z]+)?$

Optimized:

^
[^xyz]*
(?: x [^yz]* (?: y [^z]* | z [^y]* )?
|   y [^xz]* (?: x [^z]* | z [^x]* )?
|   z [^xy]* (?: x [^y]* | y [^x]* )?
)?
$

Fully Optimized: (requires regex rather than re)

^
[^xyz]*+
(?: x [^yz]*+ (?: y [^z]*+ | z [^y]*+ )?+
|   y [^xz]*+ (?: x [^z]*+ | z [^x]*+ )?+
|   z [^xy]*+ (?: x [^y]*+ | y [^x]*+ )?+
)?+
$

Upvotes: 0

khelwood
khelwood

Reputation: 59111

I'm not sure quite how you came up with what you've got, but if you want to match a sequence of (only xs and ys) or (only xs and zs) or (only ys and zs) you can use an expression like this:

^([xy]*|[xz]*|[yz]*)$

Character classes (square brackets) are a convenient way to specify "any one of these characters". So [xy]* means "a sequence of any length composed of only x and y characters".

The ^ and $ (start and end) indicate that the pattern should match your entire string.

Additionally, if you want to prevent "" (the empty string) being matched, you could replace all the * with +.

Upvotes: 1

dawg
dawg

Reputation: 103884

You need assertion for start / end of word boundaries \b and then alterations | between the three different character classes:

\b([xy]+|[zy]+|[xz]+)\b

Demo

You can also use a simpler, faster regex \b[xyz]+\b and combine with Python logic:

[w for w in re.findall(r'\b[xyz]+\b', txt) if len(set(w))<=2]

Python Demo

Upvotes: 0

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use a lookahead to make sure any string containing three (or more) different characters is failed:

^(?!.*(.).*(?!\1)(.).*(?!\1|\2).)[xyz]+$

See proof

Python:

regex = r"^(?!.*(.).*(?!\1)(.).*(?!\1|\2).)[xyz]+$"

Explanation

                         EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \1                       what was matched by capture \1
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      \2                       what was matched by capture \2
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [xyz]+                   any character of: 'x', 'y', 'z' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Upvotes: 1

Related Questions