Identity1
Identity1

Reputation: 1165

Regex pattern in java fails but works fine otherwise

I've implemented quite a complicated pattern` to match all occurences of ship set number. It works perfectly fine with global case insensitive comparison.

I use the following code to implement the same thing in Java but it doesn't match. Should Java regex be implemented differently?

int i = 0;
while (i < elementsArray.size()) {
    System.out.println("List element:"+elementsArray.get(i));
    String theRegex = "(?i)(([Ss]{2}|Ship\\s*(set))\\s*(\\#|Number|No\\.)?\\s*([:=\\-\\n\\'\\s])?\\s*\\d+\\s*(\\W*\\d+\\W?\\s*(to|and)?|(to|and)\\s*\\d+)*)";
    if (elementsArray.get(i).matches(theRegex)) {
        System.out.println("RESULT:");
        String shipsets = "";
        String thePattern = "(?i)(([Ss]{2}|Ship\\s*(set))\\s*(\\#|Number|No\\.)?\\s*([:=\\-\\n\\'\\s])?\\s*\\d+\\s*(\\W*\\d+\\W?\\s*(to|and)?|(to|and)\\s*\\d+)*)";
        Pattern pattern = Pattern.compile(thePattern);
        Matcher matcher = pattern.matcher(elementsArray.get(i));

        if (matcher.find()) {
            shipsets = matcher.group(0);
        }

        System.out.println("text==========" + shipsets);
    }

    i++;
}

Upvotes: 0

Views: 103

Answers (2)

m.cekiera
m.cekiera

Reputation: 5385

In my opinion your problems are coused by:

  1. usage of matches() in if(elementsArray.get(i).matches(theRegex)) - matches() will return true only if whole string match to regex, so it will succeed in many cases from your example, but it will fail with: SS#1,SS#5,SS#6, SS1, SS2, SS3, SS4, etc. You can simulate this situation by adding ^ at beginning and $ at the end of regex. Check how it match HERE. So it would be better solution, to use matcher.find() instead of String.matches(), like in Tim Biegeleisen answer.
  2. usage of if(matcher.find()) instead of while(matcher.find()) - in some of strings you want to retrieve more than one result, so you should use matcher.find() multiple times, to get all of them. However if will act only once, so you will get only first matched fragment from given string. To retrieve all, use loop, as matcher.find() will return false when it will not find next match in given String, and will end loop

Check this out. This is Tim Biegeleisen solution with small change (while, instead of if).

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522797

Here is a simplification of your code which should work, assuming that your regex be working correctly in Java. From my preliminary investigations, it does seem to match many of the use cases in your link. You don't need to use String.matches() because you already are using a Matcher which will check whether or not you have a match.

List<String> elementsArray = new ArrayList<String>();
elementsArray.add("Shipset Number 323");
elementsArray.add("meh");
elementsArray.add("SS NO. : 34");
elementsArray.add("Mary had a little lamb");
elementsArray.add("Ship Set #2, #33 to #4.");

for (int i=0; i < elementsArray.size(); ++i) {
    System.out.println("List element:"+elementsArray.get(i));
        String shipsets = "";
        String thePattern = "(?i)(([Ss]{2}|Ship\\s*(set))\\s*(\\#|Number|No\\.)?\\s*([:=\\-\\n\\'\\s])?\\s*\\d+\\s*(\\W*\\d+\\W?\\s*(to|and)?|(to|and)\\s*\\d+)*)";
        Pattern pattern = Pattern.compile(thePattern);
        Matcher matcher = pattern.matcher(elementsArray.get(i));

        if (matcher.find()) {
            shipsets = matcher.group(0);
            System.out.println("Found a match at element " + i + ": " + shipsets);
        }
    }
}

You can see in the output below, that the three ship test strings all matched, and the controls "meh" and "Mary had a little lamb" did not match.

Output:

List element:Shipset Number 323
Found a match at element 0: Shipset Number 323
List element:meh
List element:SS NO. : 34
Found a match at element 2: SS NO. : 34
List element:Mary had a little lamb
List element:Ship Set #2, #33 to #4.
Found a match at element 4: Ship Set #2, #33 to #4.

Upvotes: 2

Related Questions