pokero
pokero

Reputation: 1029

Java regexp matcher - multiple groups

I am using Pattern and Matcher to create a regexp which will extract the 6 values from within the li tags I have below:

 <ul class="Bold">                                <li class="ball-orange">2</li>                                <li class="ball-orange">10</li>                                <li class="ball-orange">11</li>                                <li class="ball-orange">15</li>                                <li class="ball-orange">22</li>                                <li class="ball-orange">39</li>                            </ul>

I.e. the result of the regexp needs to be groups with 2, 10, 11, 15, 22, 39.

I have the following code:

Pattern numbersPattern = Pattern.compile(".*(<li class=\"ball-orange\">([0-9]{1,2})</li>).*");
Matcher matchNumbers = numbersPattern.matcher(mainBlock);//mainBlock is the string I quoted above which contains all the li's
System.out.println("Numbers Match? " + matchNumbers.matches());//this returns true
System.out.println(matchNumbers.group(2));//returns 39, i.e. second group but for the last li

//this loop never gets entered!!!
while (matchNumbers.find()) {
    System.out.println("group 1: " + matcher.group(1));
    System.out.println("group 2: " + matcher.group(2));
    //System.out.println("group 3: " + matcher.group(3));
}

So it matches the very last li as you can see from the comments, but it does not enter the while (matchNumbers.find()) loop. I.e. I want (<li class=\"ball-orange\">([0-9]{1,2})</li>) to be found 6 times, and output in the loop, but it is not.

I am following the tut here - http://tutorials.jenkov.com/java-regex/matcher.html#groups-inside-groups.

Why is the loop not being entered, and how can I get the groups of li's to be matched?

Upvotes: 1

Views: 560

Answers (3)

Hamza Alayed
Hamza Alayed

Reputation: 643

you have to change your regex to "\<li class=\"ball-orange\"\>\d{1,2}\</li\>" use \d instead [0-9]

Pattern numbersPattern = Pattern.compile("\<li class=\"ball-orange\"\>\d{1,2}\</li\>");
        Matcher matchNumbers = numbersPattern.matcher(mainBlock);//mainBlock is the string I quoted above which contains all the li's
        System.out.println("Numbers Match? " + matchNumbers.matches());//this returns true
        System.out.println(matchNumbers.group(2));//returns 39, i.e. second group but for the last li

        //this loop never gets entered!!!
        while (matchNumbers.find()) {
            System.out.println("group 1: " + matcher.group(1));
            System.out.println("group 2: " + matcher.group(2));
            //System.out.println("group 3: " + matcher.group(3));
        }

Upvotes: -1

TWiStErRob
TWiStErRob

Reputation: 46480

Currently you're matching the whole string eagerly (anything + li + anything), that's why matches() == true. If you want all, just remove the .* parts, because .find() will find your pattern multiple times, first at position ~25 and then ~60, etc...:

    String mainBlock = "<ul class=\"Bold\">                                <li class=\"ball-orange\">2</li>                                <li class=\"ball-orange\">10</li>                                <li class=\"ball-orange\">11</li>                                <li class=\"ball-orange\">15</li>                                <li class=\"ball-orange\">22</li>                                <li class=\"ball-orange\">39</li>                            </ul>";
    Pattern listPattern = Pattern.compile("<li class=\"ball-orange\">([0-9]{1,2})</li>");
    Matcher matcher = listPattern.matcher(mainBlock);
    while (matcher.find()) {
        System.out.println("whole thing: " + matcher.group()); // or group(0)
        System.out.println("number: " + matcher.group(1));
    }

whole thing: <li class="ball-orange">2</li>
number: 2
whole thing: <li class="ball-orange">10</li>
number: 10
whole thing: <li class="ball-orange">11</li>
number: 11
whole thing: <li class="ball-orange">15</li>
number: 15
whole thing: <li class="ball-orange">22</li>
number: 22
whole thing: <li class="ball-orange">39</li>
number: 39

Note: you never need to put a group around the whole regex, capturing group 0 is by definition the whole match, that's why numbering starts at 1.

Upvotes: 1

Pshemo
Pshemo

Reputation: 124225

Your regex .*(<li class=\"ball-orange\">([0-9]{1,2})</li>).* will consume entire string because of .* at start and end. If you want to enter loop consider using only (<li class=\"ball-orange\">([0-9]{1,2})</li>) part.

Or even better instead of regex use proper tool: HTML parser like jsoup:

String mainBlock = "<ul class=\"Bold\">                                <li class=\"ball-orange\">2</li>                                <li class=\"ball-orange\">10</li>                                <li class=\"ball-orange\">11</li>                                <li class=\"ball-orange\">15</li>                                <li class=\"ball-orange\">22</li>                                <li class=\"ball-orange\">39</li>                            </ul>";
Document doc = Jsoup.parse(mainBlock);
for (Element el : doc.select("li.ball-orange")){//pick all <li class="ball-orange"> tags
    System.out.println("li tag: " + el);
    System.out.println("value in li : " + el.text());
}

Output:

li tag: <li class="ball-orange">2</li>
value in li : 2
li tag: <li class="ball-orange">10</li>
value in li : 10
li tag: <li class="ball-orange">11</li>
value in li : 11
li tag: <li class="ball-orange">15</li>
value in li : 15
li tag: <li class="ball-orange">22</li>
value in li : 22
li tag: <li class="ball-orange">39</li>
value in li : 39

Upvotes: 2

Related Questions