user3580890
user3580890

Reputation: 191

Need help for writing regular expression

I am weak in writing regular expressions so I'm going to need some help on the one. I need a regular expression that can validate that a string is an set of alphabets (the alphabets must be unique) delimited by comma.

Only one character and after that a comma

Examples:

A,E,R
R,A
E,R

Thanks

Upvotes: 0

Views: 72

Answers (5)

hwnd
hwnd

Reputation: 70732

You can use a repeated group to validate it's a comma separated string.

^[AER](?:,[AER])*$

To not have unique characters, you would do something like:

^([AER])(?:,(?!\1)([AER])(?!.*\2))*$

Upvotes: 3

Justin
Justin

Reputation: 25297

Note: I'm going to answer the original question. That is, I don't care if the elements repeat.

We've had several suggestions for this regex:

^([AER],)*[AER]$

Which does indeed work. However, to match a String, it first has to back up one character because it will find that there is no , at the end. So we switch it for this to increase performance:

^[AER](,[AER])*$

Notice that this will match a correct String the very first time it attempts to. But also note that we don't need to worry about the ( )* backing up at all; it will either match the first time, or it won't match the String at all. So we can further improve performance by using a possessive quantifier:

^[AER](,[AER])*+$

This will take the whole String and attempt to match it. If it fails, then it stops, saving time by not doing useless backing up.


If I were trying to ensure the String had no repeated elements, I would not use regex; it just complicates things. You end up with less-readable code (sadly, most people don't understand regex) and, oftentimes, slower code. So I would build my own validator:

public static boolean isCommaDelimitedSet(String toValidate, HashSet<Character> toMatch) {
    for (int index = 0; index < toValidate.length(); index++) {
        if (index % 2 == 0) {
            if (!toMatch.contains(toValidate.charAt(index))) return false;
        } else {
            if (toValidate.charAt(index) != ',') return false;
        }
    }
    return true;
}

This assumes that you want to be able to pass in a set of characters that are allowed. If you don't want that and have explicit chars you want to match, change the contents of the if (index % 2 == 0) block to:

char c = toValidate.charAt(index);
if (c == 'A' || c == 'E' || c == 'R' || /* and so on */ ) return false;

Upvotes: 1

Mateusz Dymczyk
Mateusz Dymczyk

Reputation: 15141

Something like this "^([AER],)*[AER]$"

@Edit: regarding the uniqueness, if you can drop the "last character cannot be a comma" requirement (which can be checked before the regex anyway in constant time) then this should work:

"^(?:([AER],?)(?!.*\\1))*$"

This will match A,E,R, hence you need that check before performing the regex. I do not take responsibility for the performance but since it's only 3 letters anyway...

The above is a java regex obviously, if you want a "pure one" ^(?:([AER],?)(?!.*\1))*$

@Edit2: sorry, missed one thing: this actually requires that check and then you need to add a comma at the end since otherwise it will also match A,E,E. Kind of limited I know.

Upvotes: 2

nhahtdh
nhahtdh

Reputation: 56809

My own ugly but extensible solution, which will disallow leading and trailing commas, and checks that the characters are unique.

It uses forward-declared backreference: note how the second capturing group is behind the reference made to it (?!.*\2). On the first repetition, since the second capturing group hasn't captured anything, Java treats any attempt to reference text match by second capturing group as failure.

^([AER])(?!.*\1)(?:,(?!.*\2)([AER]))*+$

Demo on regex101 (PCRE flavor has the same behavior for this case)

Demo on RegexPlanet

Test cases:

A,E,R
A,R,E
E,R,A
A
R,E
R
E

A,
A,R,
A,A,R
E,A,E
A,E,E
X,R,E
R,A,E,
,A
AA,R,E

Upvotes: 1

ajb
ajb

Reputation: 31689

If I understand it correctly, a valid string will be a series (possibly zero long) of two-character patterns, where each pattern is a letter followed by a comma; finally followed at the end by one letter.

Thus:

"^([A-Za-z],)*[A-Za-z]$"

EDIT: Since you've clarified that the letters have to be A, E, or R:

"^([AER],)*[AER]$"

Upvotes: 2

Related Questions