Andrea
Andrea

Reputation: 6123

Regex: how to substitute a string with n occurrences of a substring

As a premise, I have an HTML text, with some <ol> elements. These have a start attribute, but the framework I'm using is not capable to interpret them during a PDF conversion. So, the trick I am trying to apply is to add a number of invisible <li> elements at the beginning.

As an example, suppose this input text:

<ol start="3">
   <li>Element 1</li>
   <li>Element 2</li>
   <li>Element 3</li>
</ol>

I want to produce this result:

<ol>
   <li style="visibility:hidden"></li>
   <li style="visibility:hidden"></li>
   <li>Element 1</li>
   <li>Element 2</li>
   <li>Element 3</li>
</ol>

So, adding n-1 invisible elements into the ordered list. But I'm not able to do that from Java in a generalized way.

Supposing the exact case in the example, I could do this (using replace, so - to be honest - without regex):

htmlString = htmlString.replace("<ol start=\"3\">",
            "<ol><li style=\"visibility:hidden\"></li><li style=\"visibility:hidden\"></li>");

But, obviously, it just applies to the case with "start=3". I know that I can use groups to extract the "3", but how can I use it as a "variable" to specify the string <li style=\"visibility:hidden\"></li> n-1 number of times? Thanks for any insight.

Upvotes: 2

Views: 191

Answers (5)

tobias_k
tobias_k

Reputation: 82899

Since Java 9, there's a Matcher.replaceAll method taking a callback function as a parameter:

String text = "<ol start=\"3\">\n\t<li>Element 1</li>\n\t<li>Element 2</li>\n\t<li>Element 3</li>\n</ol>";

String result = Pattern
        .compile("<ol start=\"(\\d)\">")
        .matcher(text)
        .replaceAll(m -> "<ol>" + repeat("\n\t<li style=\"visibility:hidden\" />", 
                                         Integer.parseInt(m.group(1))-1));      

To repeat the string you can take the trick from here, or use a loop.

public static String repeat(String s, int n) {
    return new String(new char[n]).replace("\0", s);
}

Afterwards, result is:

<ol>
    <li style="visibility:hidden" />
    <li style="visibility:hidden" />
    <li>Element 1</li>
    <li>Element 2</li>
    <li>Element 3</li>
</ol>   

If you are stuck with an older version of Java, you can still match and replace in two steps.

Matcher m = Pattern.compile("<ol start=\"(\\d)\">").matcher(text);
while (m.find()) {
    int n = Integer.parseInt(m.group(1));
    text = text.replace("<ol start=\"" + n + "\">", 
            "<ol>" + repeat("\n\t<li style=\"visibility:hidden\" />", n-1));
}

Update by Andrea ジーティーオー:

I modified the (great) solution above for including also <ol> that have multiple attributes, so that their tag do not end with start (example, <ol> with letters, as <ol start="4" style="list-style-type: upper-alpha;">). This uses replaceAll to deal with regex as a whole.

//Take something that starts with "<ol start=", ends with ">", and has a number in between
Matcher m = Pattern.compile("<ol start=\"(\\d)\"(.*?)>").matcher(htmlString);
while (m.find()) {
    int n = Integer.parseInt(m.group(1));
    htmlString = htmlString.replaceAll("(<ol start=\"" + n + "\")(.*?)(>)",
            "<ol $2>" + StringUtils.repeat("\n\t<li style=\"visibility:hidden\" />", n - 1));
}

Upvotes: 3

Nawnit Sen
Nawnit Sen

Reputation: 1038

You can try this.

String input="<ol    start=\"6\">"+
   "<li>Element 1</li>"+
   "<li>Element 2</li>"+
   "<li>Element 3</li>"+
   "<li>Element 4</li>"+
   "<li>Element 5</li>"+
   "<li>Element6</li>"+
"</ol>";

 Matcher match= Pattern.compile("<ol .*start.*=.*\\\"(.*)\\\"\\s*>(.*)(</ol>)").matcher(input);
    String resultString ="";
    if(match.find()){
    resultString =match.replaceAll("<ol>"+new String(new char[Integer.valueOf(match.group(1))-1]).replace("\0", "\n\t<li style=\"visibility:hidden\" />")+"$2$3");  

}

Upvotes: 0

Eritrean
Eritrean

Reputation: 16498

Using Jsoup you can write something like:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

class JsoupTest {
    public static void main(String[] args){
        String html = "<ol start=\"3\">\n" +
                        "   <li>Element 1</li>\n" +
                        "   <li>Element 2</li>\n" +
                        "   <li>Element 3</li>\n" +
                        "</ol>"
                + "<p>some other html elements</p>"
                + "<ol start=\"5\">\n" +
                        "   <li>Element 1</li>\n" +
                        "   <li>Element 2</li>\n" +
                        "   <li>Element 3</li>\n" +
                        "   <li>Element 4</li>\n" +
                        "   <li>Element 5</li>\n" +
                        "</ol>";

        Document doc = Jsoup.parse(html);
        Elements ols = doc.select("ol");
        for(Element ol :ols){
            int start = Integer.parseInt(ol.attr("start"));
            for(int i=0; i<start-1; i++){
                ol.prependElement("li").attr("style", "visibility:hidden");
            }  
            ol.attributes().remove("start");
            System.out.println(ol);
        }
    }
}

Upvotes: 1

Sudha mohan Panda
Sudha mohan Panda

Reputation: 7

Please use java Matcher and Pattern to count the occurrence of li tag and use StringBuilder insert method to insert invisible elements.

Matcher m = Pattern.compile("<li>").matcher(s);
        while(m.find()){
           ++count;
        }

Upvotes: -2

Ailef
Ailef

Reputation: 7906

You cannot do this using regular expressions, or even if you find some hack to do this it's going to be a suboptimal solution..

The right way to do this is to use an HTML parsing library (e.g. Jsoup) and then add the <li> tags as children to the <ol>, specifically using the Element#prepend method. (With Jsoup you can also read the start attribute value in order to compute how many elements to add)

Upvotes: 4

Related Questions