Susan
Susan

Reputation: 301

regex PCRE, alphabet with a,b, should match words without consecutive chars

I am trying to build a regex that should match for example.

b
abab
babab

but not

bb
babb
aaaba
abaaba

At the moment I have a(b)|b(a) and it is working for abab. I am missing the first and last letter, for example b or babab.

So I need to specify a alone or b alone or a letter at the end of a word (if the letter before it isn't itself). But I can't figure out how to do that.

I am using http://www.rexv.org/ (Perl PCRE) to try it.

Thanks guys but i forgot to mention: An empty string can also be matched, and i can only use the following

* ? +

|

()

.

Thanks guys!,

i suppose it isn't possible without being able to specify beginning and end of the string to work correctly at http://www.rexv.org/

Upvotes: 2

Views: 181

Answers (4)

Rohit Jain
Rohit Jain

Reputation: 213281

Edit: -

If you don't want to use look-ahead and look-behind assertion, you can use this regex: -

"b?(ab)*|a?(ba)*"  // Will also match `empty string`

Explanation : -

b?   // 0 or 1 b
(    // capture group. 
  ab // Match ab
)*   // group close `0 or more repetition

|

a?(ba)*  // Same with `a` replaced with `b`, and `b` with `a`

Old Answer : -

Use this regex: -

"((?<!a)a|(?<!b)b)*"   // This will also match empty string

It matches a not preceded by another a. Same for b.

(            // Capture group
    (?<!     // Negative Look-behind assertion
        a    // on a
    )
     a       // Match a

    |        // or

    (?<!     // Negative Look-behind assertion
        b    // on b
    )
     b       // Match b
)                 // Close capture group
+  // 1 or more repetition

Upvotes: 0

Yogendra Singh
Yogendra Singh

Reputation: 34367

Instead of building complex match regex, use the simple regex to match the repeating chars and use the opposite as:

    String stringToMatch = "babaab";
    Pattern p1 = Pattern.compile("^[ab]+$");//match the a`s and b`s kind of string
    Pattern p2 = Pattern.compile("([ab])\\1+");//match the repeating a`s and b`s
    Matcher m1 = p1.matcher(stringToMatch);
    Matcher m2 = p2.matcher(stringToMatch);
    if (m1.find() && !m2.find()){//validates it has a's and b's but not repeating
       //valid string
    }

To match any word character, simply use: (\\w)\\1+. This is the best part of it. Simple and extendable to cover more set of characters e.g. abcdabcd etc.

Upvotes: 0

Sina Iravanian
Sina Iravanian

Reputation: 16296

Try this one:

^((b?(ab)*a?)|(a?(ba)*b?))$

This assumes your alphabet is limited to {a, b}.

Upvotes: 0

newfurniturey
newfurniturey

Reputation: 38446

Try something like this:

^((?:(?:ab)*a?)|(?:(?:ba)*b?))$

Explained:

^(                   # beginning of the string
    (?:
        (?:ab)*      # matches any repeating `ab` group
        a?           # group can optionally end with an `a`
    )
    |
    (?:
        (?:ba)*      # matches any repeating `ba` group
        b?           # group can optionally end with a `b`
    )
)$                   # end of the string

I include the subgroups as non-capturing with the leading (?: using a full-capturing group around the entire regex. This will make sure to return you only the full-strings that match instead of the noise of each sub-group.

The caveat to this approach is that an "empty" string will also match.

UPDATE (limited set of characters)
Your limited-set of characters will still work with my pattern above, however, we'll need to drop the non-matching group portion (?:). The regex will end up as:

(((ab)*a?)|((ba)*b?))

The caveat mentioned above is that it will also match an empty string, however, this appears to be what you need so we can add that to the bonus-list!

A slight issue with the set of characters you're allowed to use is that you aren't allowed to use the ^ and $ characters which indicate the start and end of a string, respectively. The problem with this is that, any sub-pattern that is matched (regardless of the regex you use) will flag the input as valid. I'm assuming that this is accounted for though.

Upvotes: 2

Related Questions