Reputation: 1165
I've implemented quite a complicated pattern` to match all occurences of ship set number. It works perfectly fine with global case insensitive comparison.
I use the following code to implement the same thing in Java but it doesn't match. Should Java regex be implemented differently?
int i = 0;
while (i < elementsArray.size()) {
System.out.println("List element:"+elementsArray.get(i));
String theRegex = "(?i)(([Ss]{2}|Ship\\s*(set))\\s*(\\#|Number|No\\.)?\\s*([:=\\-\\n\\'\\s])?\\s*\\d+\\s*(\\W*\\d+\\W?\\s*(to|and)?|(to|and)\\s*\\d+)*)";
if (elementsArray.get(i).matches(theRegex)) {
System.out.println("RESULT:");
String shipsets = "";
String thePattern = "(?i)(([Ss]{2}|Ship\\s*(set))\\s*(\\#|Number|No\\.)?\\s*([:=\\-\\n\\'\\s])?\\s*\\d+\\s*(\\W*\\d+\\W?\\s*(to|and)?|(to|and)\\s*\\d+)*)";
Pattern pattern = Pattern.compile(thePattern);
Matcher matcher = pattern.matcher(elementsArray.get(i));
if (matcher.find()) {
shipsets = matcher.group(0);
}
System.out.println("text==========" + shipsets);
}
i++;
}
Upvotes: 0
Views: 103
Reputation: 5385
In my opinion your problems are coused by:
matches()
in if(elementsArray.get(i).matches(theRegex))
- matches()
will return
true
only if whole string match to regex, so it will succeed in
many cases from your example, but it will fail with:
SS#1,SS#5,SS#6
, SS1, SS2, SS3, SS4
, etc. You can simulate this
situation by adding ^
at beginning and $
at the end of regex.
Check how it match HERE. So it would be better solution, to use
matcher.find()
instead of String.matches()
, like in Tim
Biegeleisen answer.if(matcher.find())
instead of while(matcher.find())
- in
some of strings you want to retrieve more than one result, so you
should use matcher.find()
multiple times, to get all of them.
However if
will act only once, so you will get only first matched
fragment from given string. To retrieve all, use loop, as matcher.find()
will return false
when it will not find next match in given String, and will end loopCheck this out. This is Tim Biegeleisen solution with small change (while
, instead of if
).
Upvotes: 1
Reputation: 522797
Here is a simplification of your code which should work, assuming that your regex be working correctly in Java. From my preliminary investigations, it does seem to match many of the use cases in your link. You don't need to use String.matches()
because you already are using a Matcher
which will check whether or not you have a match.
List<String> elementsArray = new ArrayList<String>();
elementsArray.add("Shipset Number 323");
elementsArray.add("meh");
elementsArray.add("SS NO. : 34");
elementsArray.add("Mary had a little lamb");
elementsArray.add("Ship Set #2, #33 to #4.");
for (int i=0; i < elementsArray.size(); ++i) {
System.out.println("List element:"+elementsArray.get(i));
String shipsets = "";
String thePattern = "(?i)(([Ss]{2}|Ship\\s*(set))\\s*(\\#|Number|No\\.)?\\s*([:=\\-\\n\\'\\s])?\\s*\\d+\\s*(\\W*\\d+\\W?\\s*(to|and)?|(to|and)\\s*\\d+)*)";
Pattern pattern = Pattern.compile(thePattern);
Matcher matcher = pattern.matcher(elementsArray.get(i));
if (matcher.find()) {
shipsets = matcher.group(0);
System.out.println("Found a match at element " + i + ": " + shipsets);
}
}
}
You can see in the output below, that the three ship test strings all matched, and the controls "meh"
and "Mary had a little lamb"
did not match.
Output:
List element:Shipset Number 323
Found a match at element 0: Shipset Number 323
List element:meh
List element:SS NO. : 34
Found a match at element 2: SS NO. : 34
List element:Mary had a little lamb
List element:Ship Set #2, #33 to #4.
Found a match at element 4: Ship Set #2, #33 to #4.
Upvotes: 2