user1635689
user1635689

Reputation: 79

Uppercase Words & Characters java matches

I can't get a simple regex to work, right now I have the following java code

String regex = "^([^A-Za-z]*?[A-Z][A-Za-z]*?)+.?";
String string = "AQUA, CETEARYL ALCOHOL, CETYL ESTERS, BEHENTRIMONIUM CHLORIDE, CETRIMONIUM CHLORIDE, AMODIMETHICONE, TRIDECETH-12, PARFUM, METHYLPARABEN, HEXYL CINNAMAL, LINALOOL, BENZYL SALICYLATE, LIMONENE, LAMINARIA DIGITATA, CHAMOMILLA RECUTITA , ANICOZANTHOS FLAVIDUS, SODIUM BENZ0ATE, PHENOXYETHANOL, ETHYLPARABEN, BUTYLPARABEN, PROPYLPARABEN, P0LYS0RBATE 20, CI 19140, CI 14700.";
System.out.println(string.matches(regex)); 

The problem is that the execution never ends. Please use my regex only to see how I fail. What I need sounds simple to me: - There can be any text. - All words in this text should be upper case. - If there are Single characters, they should be uppercase too. - Anything between (numbers, comma,...) should be matched always. See complex sample above. Simple is:

Test, Test, Test = true
Test, test, Test = false
Test, 7-Test Test, Test = true
Test, 7-Test test, Test = false
na = false
NA = true
N/A = true
PHENOXYETHANOL, P0LYS0RBATE 20, CI 19140, CI 14700. = true

Thanks a lot!!!

Upvotes: 1

Views: 3941

Answers (4)

DaoWen
DaoWen

Reputation: 33019

This seems to work on all the inputs you provided:

"^((^|[^A-Za-z]+)[A-Z][A-Za-z]*)*[^A-Za-z]*$"

I'm not sure how your validator works, but it doesn't hurt to force matching the full string by adding the ^ and $ symbols on either end.

Your regular expression never terminates because you used too many * (match zero or more) groups, which made the state space explode. Notice how I use a + on the [^A-Za-z] group, which forces it to match at least one non-letter between match groups. This keeps the number of matches to a reasonable number. However, since mine matches a full string (it starts with ^ and ends with $) it can only find a single match anyway.

Edit:

If you don't want the empty string to match then change the second-to-last * to a +:

"^((^|[^A-Za-z]+)[A-Z][A-Za-z]*)+[^A-Za-z]*$"

Upvotes: 1

f.leno
f.leno

Reputation: 151

Maybe this regex works for you:

\p{Upper}*[^\p{Lower}]*\p{Upper}*

it means:

\p{Upper} any uppercase character

[^\p{Lower}] any character except lowercase ones

obs: a empty text will matches too

Upvotes: 0

mwikblom
mwikblom

Reputation: 361

This might work for you

String regex = "^([A-Z0-9]+[A-Za-z0-9,./\-]\s)+$";

you may need to add some more separators (,./ and - in the example)

Upvotes: 0

Victor Mukherjee
Victor Mukherjee

Reputation: 11025

you better use delimiter, for eg with a stringtokenizer and then check, it will be a lot more easier. use ',' as a delimeter and then trim each token and check with regex.

Upvotes: 0

Related Questions