SpencasaurusRex
SpencasaurusRex

Reputation: 45

Regex Evaluating Matches Incorrectly

I am having trouble getting a Java flavored Regular expression to evaluate a match correctly. I define the following regular expressions:

//Any digit
static String NUM = "[0-9]";

//Exponent with only 3 digits specified
static String EXPONENT = "([Ee][+-]?" + NUM + "(" + NUM + "(" + NUM + ")?)?)";

static String NUMBER = "([+-]?((" + NUM + NUM + "*.?" + NUM + "*)|(." + NUM
        + NUM + "*))" + EXPONENT + "?)";

static String S_COMMA_S = "(( )*,( )*)";

static String NUM_DATA = "(" + NUMBER + "(" + S_COMMA_S + NUMBER + ")*)";

With how NUM_DATA is defined a possible match would be "123, 456" As far as my understanding goes, any list of numbers ending with a number and not a comma should be valid. However, according to the following test method, it matches a number list ending in a comma

public static void main(String[] args) {
        System.out.println(NUM_DATA);
        String s = "123";
        System.out.println(s.matches(NUM_DATA));
        s = "123, 456";
        System.out.println(s.matches(NUM_DATA));
        s = "123, 456,";//HANGING COMMA, SHOULD NOT MATCH
        System.out.println(s.matches(NUM_DATA));
}

Which results in the following output:

(([+-]?(([0-9][0-9]*.?[0-9]*)|(.[0-9][0-9]*))([Ee][+-]?[0-9]([0-9]([0-9])?)?)?)((( )*,( )*)([+-]?(([0-9][0-9]*.?[0-9]*)|(.[0-9][0-9]*))([Ee][+-]?[0-9]([0-9]([0-9])?)?)?))*)
true
true
true

Where are my assumptions going wrong? Or is this behavior incorrect?

EDIT: I suppose I should post the behavior I am expecting

Matches: (Any list of comma separated numbers, including one number)
    1.222
    1.222, 324.4
    2.51e123, 3e2
    -.123e-12, 32.1231, 1e1, .111, -1e-1
Non-Matches:
    123.321,
    ,
    , 123.321

Upvotes: 2

Views: 61

Answers (2)

anubhava
anubhava

Reputation: 784958

Your regex can be refactored to a shorter:

^([+-]?(?:\.\d+|\d+(?:\.\d+)?)(?:[Ee][+-]?\d+)?)(?: *, *([+-]?(?:\.\d+|\d+(?:\.\d+)?)(?:[Ee][+-]?\d+)?))*$

This will still meet your requirements as you can see in this:

RegEx Demo

You will get all your numbers in matched groups.

I recommend you to use this regex with Pattern and Matcher API to avoid compiling this long regex again & again in String#matches.

Upvotes: 2

maraca
maraca

Reputation: 8743

In your NUMBER regex you have a . which matches any character, also a comma in the end, you need to escape it \., but in Java Strings \ has to be escaped, so it is "\\." in a String.

Upvotes: 2

Related Questions