NotFunny
NotFunny

Reputation: 37

Get last n matches from String with regex

I have a String that is a HTML withoutany kind of close tag (</.*?>) and without any new line (\n):

<tr><td align=center>01/01/2001<td align=center>500,01<td align=center>0,99<td align=center>15

This repeat indefinitely and may have 1 or more td's for values. At the moment I am using String.split("<tr><td align=center>") to separate the String and then use one regex to find the date and one to find the value I want.

Something like this:

String[] stringArray = text.split("<tr><td align=center>");

        String[] array1 = Arrays.copyOfRange(stringArray, stringArray.length - /*0<n<21*/,
                stringArray.length);

        for (int i = 0; i < array1.length; i++) {
            System.out.println(array1[i]);
            m1 = Pattern.compile("(\\d{2}\\/\\d{2}\\/\\d{4})").matcher(
                    array1[i]);

            //getting date
            m1.find();
            System.out.println(m1.group(1));

            m1 = Pattern.compile("<td align=center>(\\d+,*\\d*)").matcher(array1[i]);
            while (m1.find()) {
                System.out.println(m1.group(/*0<n*/));
            }
        }

I want a way to get a String that is equivalent to array1 (the last n positions of a string) but using regex.

I know I can use a bigger regex with $ at the end to get the last <tr>, but I want to get all 19 <tr> before it to.

I don't know if I am being clear here. Let me know if I can provide more details.

PS: yes the values are writen with ',' instead of '.'... I use a replace later on.

Upvotes: 2

Views: 225

Answers (1)

Thomas
Thomas

Reputation: 88707

With Java regular expressions you can't collect an arbitrary number of matches into a single group, so unless you know the exact/maximum number of groups you'd have to apply the regex multiple times and collect the matches yourself.

Btw, you should check whether m1.find(); returns true before calling m1.group(1); otherwise you'd get an IllegalStateException if the expression doesn't match.

As another note, I'd compile the date pattern outside the loop, probably in some initialization code.

Upvotes: 1

Related Questions