Reputation: 1029
I am using Pattern and Matcher to create a regexp which will extract the 6 values from within the li tags I have below:
<ul class="Bold"> <li class="ball-orange">2</li> <li class="ball-orange">10</li> <li class="ball-orange">11</li> <li class="ball-orange">15</li> <li class="ball-orange">22</li> <li class="ball-orange">39</li> </ul>
I.e. the result of the regexp needs to be groups with 2, 10, 11, 15, 22, 39.
I have the following code:
Pattern numbersPattern = Pattern.compile(".*(<li class=\"ball-orange\">([0-9]{1,2})</li>).*");
Matcher matchNumbers = numbersPattern.matcher(mainBlock);//mainBlock is the string I quoted above which contains all the li's
System.out.println("Numbers Match? " + matchNumbers.matches());//this returns true
System.out.println(matchNumbers.group(2));//returns 39, i.e. second group but for the last li
//this loop never gets entered!!!
while (matchNumbers.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
//System.out.println("group 3: " + matcher.group(3));
}
So it matches the very last li as you can see from the comments, but it does not enter the while (matchNumbers.find()) loop. I.e. I want (<li class=\"ball-orange\">([0-9]{1,2})</li>)
to be found 6 times, and output in the loop, but it is not.
I am following the tut here - http://tutorials.jenkov.com/java-regex/matcher.html#groups-inside-groups.
Why is the loop not being entered, and how can I get the groups of li's to be matched?
Upvotes: 1
Views: 560
Reputation: 643
you have to change your regex to "\<li class=\"ball-orange\"\>\d{1,2}\</li\>"
use \d instead [0-9]
Pattern numbersPattern = Pattern.compile("\<li class=\"ball-orange\"\>\d{1,2}\</li\>");
Matcher matchNumbers = numbersPattern.matcher(mainBlock);//mainBlock is the string I quoted above which contains all the li's
System.out.println("Numbers Match? " + matchNumbers.matches());//this returns true
System.out.println(matchNumbers.group(2));//returns 39, i.e. second group but for the last li
//this loop never gets entered!!!
while (matchNumbers.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
//System.out.println("group 3: " + matcher.group(3));
}
Upvotes: -1
Reputation: 46480
Currently you're matching the whole string eagerly (anything + li + anything), that's why matches() == true
. If you want all, just remove the .*
parts, because .find()
will find your pattern multiple times, first at position ~25 and then ~60, etc...:
String mainBlock = "<ul class=\"Bold\"> <li class=\"ball-orange\">2</li> <li class=\"ball-orange\">10</li> <li class=\"ball-orange\">11</li> <li class=\"ball-orange\">15</li> <li class=\"ball-orange\">22</li> <li class=\"ball-orange\">39</li> </ul>";
Pattern listPattern = Pattern.compile("<li class=\"ball-orange\">([0-9]{1,2})</li>");
Matcher matcher = listPattern.matcher(mainBlock);
while (matcher.find()) {
System.out.println("whole thing: " + matcher.group()); // or group(0)
System.out.println("number: " + matcher.group(1));
}
whole thing: <li class="ball-orange">2</li>
number: 2
whole thing: <li class="ball-orange">10</li>
number: 10
whole thing: <li class="ball-orange">11</li>
number: 11
whole thing: <li class="ball-orange">15</li>
number: 15
whole thing: <li class="ball-orange">22</li>
number: 22
whole thing: <li class="ball-orange">39</li>
number: 39
Note: you never need to put a group around the whole regex, capturing group 0 is by definition the whole match, that's why numbering starts at 1.
Upvotes: 1
Reputation: 124225
Your regex .*(<li class=\"ball-orange\">([0-9]{1,2})</li>).*
will consume entire string because of .*
at start and end. If you want to enter loop consider using only (<li class=\"ball-orange\">([0-9]{1,2})</li>)
part.
Or even better instead of regex use proper tool: HTML parser like jsoup:
String mainBlock = "<ul class=\"Bold\"> <li class=\"ball-orange\">2</li> <li class=\"ball-orange\">10</li> <li class=\"ball-orange\">11</li> <li class=\"ball-orange\">15</li> <li class=\"ball-orange\">22</li> <li class=\"ball-orange\">39</li> </ul>";
Document doc = Jsoup.parse(mainBlock);
for (Element el : doc.select("li.ball-orange")){//pick all <li class="ball-orange"> tags
System.out.println("li tag: " + el);
System.out.println("value in li : " + el.text());
}
Output:
li tag: <li class="ball-orange">2</li>
value in li : 2
li tag: <li class="ball-orange">10</li>
value in li : 10
li tag: <li class="ball-orange">11</li>
value in li : 11
li tag: <li class="ball-orange">15</li>
value in li : 15
li tag: <li class="ball-orange">22</li>
value in li : 22
li tag: <li class="ball-orange">39</li>
value in li : 39
Upvotes: 2