Uebertreiberman
Uebertreiberman

Reputation: 141

Java regex pattern not working as intended

I am new to patterns and regex and have encountered a problem which I can't solve. This is my code:

public static void main(String[] args) {

    Pattern pattern = Pattern.compile("(!?)(fw|ri|le|cl|rs)[\\s,]*(\\d*\\.*\\d*|\"\\w*\")?[\\s,]*(\\d*\\.*\\d*|\"\\w*\")?[\\s,]*(\\d*\\.*\\d*|\"\\w*\")?");
    Matcher matcher = pattern.matcher("!fw 90.0 \"hello\" 70.0");

    matcher.find();
    for(int i = 0; i < matcher.groupCount()+1; i++) {
        System.out.println("Group "+i+") " + matcher.group(i));
    }
}

So, i've used regexr.com to create the regex, and on the website it works as planned. It should find 3 arguments which can be either a number or a String, where the String is enclosed in quotation mark. As I said, on regexr.com it works, however in java it does only, when there are no Strings. What am I doing wrong? (The regex without the extra backslashes is (!?)(fw|ri|le|cl|rs)[\s,]*(\d*\.*\d*|"\w*")?[\s,]*(\d*\.*\d*|"\w*")?[\s,]*(\d*\.*\d*|"\w*")? )

Thanks in advance.

Edit: Some examples of what does happen and what doesn't:

Working as intended:

Input: !fw 1.0 2.0 3.0

Ouput: Group 0) !fw 1.0 2.0 3.0 Group 1) ! Group 2) fw Group 3) 1.0 Group 4) 2.0 Group 5) 3.0

Not working as intended:

Input: !fw 1.0 \"hello\" 3.0

Output: Group 0) !fw 1.0 Group 1) ! Group 2) fw Group 3) 1.0 Group 4) Group 5)

Intended Output: Group 0) !fw 1.0 "hello" 3.0 Group 1) ! Group 2) fw Group 3) 1.0 Group 4) "hello" Group 5) 3.0

Upvotes: 2

Views: 1270

Answers (3)

matt freake
matt freake

Reputation: 5090

The way I debug an issue like this is to simplify your non-working pattern and String if necessary, until it works, and then start building it up again until it breaks.

In your case the "hello" part is where it is currently failing, so simplify your string to:

"!fw 90.0 \"h"

so you only have the beginning of hello and simplify your regexp to:

(!?)(fw|ri|le|cl|rs)[\\s,]*(\\d*\\.*\\d*|\"\\w*\")?[\\s,]*(\"\\w)

so it should only match a non-optional " and one letter. This works with your string fine.

So then I gradually make that last part

(\"\\w)

more like your

(\\d*\\.*\\d*|\"\\w*\")

and repeat until it stops matching again. This happens as soon as I have a \\d* in front of |. So that \d* is causing the problem. Why? Well as Pshemo says and that's because it will try to match 0 or more digits, before even trying the second part of the 'or'. Because it matches 0 digits, the regexp then succeeds and doesn't try your \w part.

As Pshemo mentions, changing \d* to \d+ fixes that and is probably more what you actually want to match

Upvotes: 0

Cecilya
Cecilya

Reputation: 527

You can get your regex to work if you switch the order of the expression for Strings and numbers:

(!?)(fw|ri|le|cl|rs)[\\s,]*(\"\\w*\"|\\d*\\.*\\d*)?[\\s,]*(\"\\w*\"|\\d*\\.*\\d*)?[\\s,]*(\"\\w*\"\\d*\\.*\\d*)?

However, I'm not sure your regex does exactly what you want it to do - it matches a lot more, to be more specific. E.g.:

!fw ...""

This is because so much in your regex is optional or can be repeated any number of times. (Like the dot, which I'm guessing is not what you intended.) Assuming you want to have a exactly 3 groups of either String or a number with optional decimal point and either a whitespace, a comma or nothing separating them, you should use this regex:

(!?)(fw|ri|le|cl|rs)([\\s,]*(\"\\w*\"|\\d+(\\.\\d+)?)[\\s,]*){3}

This will match Strings such as:

!fw 90.0 \"hello\" 70.0

!fw \"hello\" 70.0

!fw\"hello\"70.0

but will not match

!fw ...\"\"

This is because in your regex, you specify \\d*\\.*\\d*, which means "0-n numbers, 0-n dots, 0-n numbers". By changing \\.* to \\.? you specify "0-1 dots", which takes care of your dot problem. But you would still match . or .9 to this regex, which is why you make the first number compulsive with a + and then add an optional argument for decimal points (\\.d+)?, which means "1 dot and 1-n numbers". Now it will match numbers without decimal points and numbers with a decimal point but not numbers such as 3. or .3.

The {3} specifies that you want exactly three occurrences of this group. If you kept these groups optional with a * you would also get results for input with 0-2 occurrences of your pattern. If this is your intended behaviour you should consider whether you want to allow multiple whitespaces or commas to appear between you numbers/Strings. If no, you should make them dependant on whether there was a String/number before.

Upvotes: 1

Pshemo
Pshemo

Reputation: 124225

One of way-around could be changing \\d*\\. to \\d+\\.. This prevent groups from accepting empty strings as it happens now in case of groups 4 and 5 (since this case can be accepted before checking |"\w*" part).

Upvotes: 1

Related Questions