user6739736
user6739736

Reputation:

Writing one regular expression for string in java

I am trying to write one regular expression for string. Let us say there is a string RBY_YBR where _ represents empty so we can recursively replace the alphabets and _ and the result is RRBBYY_ . There can be two or more alphabet pairs can be formed or something like this also RRR .

Conditions
1). Left or right alphabet should be the same.
2). If there is no _ then the alphabet should be like RRBBYY not RBRBYY or RBYRBY etc.
3). There can be more than one underscore _ .
From regular expression I am trying to find whether the given string can satisfy the regular expression or not by replacing the character with _ to form a pattern of consecutive alphabets
The regular expression which I wrote is

String regEx = "[A-ZA-Z_]";

But this regular expression is failing for RBRB. since there is no empty space to replace the characters and RBRB is also not in a pattern.
How could I write the effective regular expression to solve this.

Upvotes: 2

Views: 113

Answers (2)

Addison
Addison

Reputation: 8347

Please take my answer with a grain of salt, since it's a bit of a "Fastest gun in the West" post.

It follows the same assumptions as Florian Albrecht's answer. (thanks)

I believe that this will solve your problem:

(([A-Za-z])(\2|_)+)+

https://regex101.com/r/7TfSVc/1

It works by using the second capturing group and ensuring that more of it follow, or there are underscores.

Known bug: it does not work if an underscore starts a string.

EDIT

This one is better, though I forgot what I was doing by the end of it.

(([A-Za-z_])(\2|_)+|_+[A-Za-z]_*)+

https://regex101.com/r/7TfSVc/4

Upvotes: 0

Florian Albrecht
Florian Albrecht

Reputation: 2326

Ok, as I understand it, a matching string shall either consist only of same characters being grouped together, or must contain at least one underscore.

So, RRRBBR would be invalid, while RRRRBB, RRRBBR_, and RRRBB_R_ would all be valid.

After comment of question creator, additional condition: Every character must occur 0 or 2 or more times.

As far as I know, this is not possible with Regular Expressions, as Regular Expressions are finite-state machines without "storage". You would have to "store" each character found in the string to check that it won't appear later again.

I would suggest a very simple method for verifying such strings:

public static boolean matchesMyPattern(String s) {
    boolean withUnderscore = s.contains("_");

    int[] found = new int[26];

    for (int i = 0; i < s.length(); i++) {
        char ch = s.charAt(i);
        if (ch != '_' && (ch < 'A' || ch > 'Z')) {
            return false;
        }

        if (ch != '_' && i > 0 && s.charAt(i - 1) != ch && found[ch - 'A'] > 0
                && !withUnderscore) {
            return false;
        }
        if (ch != '_') {
            found[ch - 'A']++;
        }
    }

    for (int i = 0; i < found.length; i++) {
        if (found[i] == 1) {
            return false;
        }
    }

    return true;
}

Upvotes: 1

Related Questions