Reputation: 6123
As a premise, I have an HTML text, with some <ol>
elements. These have a start
attribute, but the framework I'm using is not capable to interpret them during a PDF conversion. So, the trick I am trying to apply is to add a number of invisible <li>
elements at the beginning.
As an example, suppose this input text:
<ol start="3">
<li>Element 1</li>
<li>Element 2</li>
<li>Element 3</li>
</ol>
I want to produce this result:
<ol>
<li style="visibility:hidden"></li>
<li style="visibility:hidden"></li>
<li>Element 1</li>
<li>Element 2</li>
<li>Element 3</li>
</ol>
So, adding n-1 invisible elements into the ordered list. But I'm not able to do that from Java in a generalized way.
Supposing the exact case in the example, I could do this (using replace
, so - to be honest - without regex):
htmlString = htmlString.replace("<ol start=\"3\">",
"<ol><li style=\"visibility:hidden\"></li><li style=\"visibility:hidden\"></li>");
But, obviously, it just applies to the case with "start=3". I know that I can use groups to extract the "3", but how can I use it as a "variable" to specify the string <li style=\"visibility:hidden\"></li>
n-1 number of times?
Thanks for any insight.
Upvotes: 2
Views: 191
Reputation: 82899
Since Java 9, there's a Matcher.replaceAll
method taking a callback function as a parameter:
String text = "<ol start=\"3\">\n\t<li>Element 1</li>\n\t<li>Element 2</li>\n\t<li>Element 3</li>\n</ol>";
String result = Pattern
.compile("<ol start=\"(\\d)\">")
.matcher(text)
.replaceAll(m -> "<ol>" + repeat("\n\t<li style=\"visibility:hidden\" />",
Integer.parseInt(m.group(1))-1));
To repeat
the string you can take the trick from here, or use a loop.
public static String repeat(String s, int n) {
return new String(new char[n]).replace("\0", s);
}
Afterwards, result
is:
<ol>
<li style="visibility:hidden" />
<li style="visibility:hidden" />
<li>Element 1</li>
<li>Element 2</li>
<li>Element 3</li>
</ol>
If you are stuck with an older version of Java, you can still match and replace in two steps.
Matcher m = Pattern.compile("<ol start=\"(\\d)\">").matcher(text);
while (m.find()) {
int n = Integer.parseInt(m.group(1));
text = text.replace("<ol start=\"" + n + "\">",
"<ol>" + repeat("\n\t<li style=\"visibility:hidden\" />", n-1));
}
Update by Andrea ジーティーオー:
I modified the (great) solution above for including also <ol>
that have multiple attributes, so that their tag do not end with start
(example, <ol>
with letters, as <ol start="4" style="list-style-type: upper-alpha;">
). This uses replaceAll
to deal with regex as a whole.
//Take something that starts with "<ol start=", ends with ">", and has a number in between
Matcher m = Pattern.compile("<ol start=\"(\\d)\"(.*?)>").matcher(htmlString);
while (m.find()) {
int n = Integer.parseInt(m.group(1));
htmlString = htmlString.replaceAll("(<ol start=\"" + n + "\")(.*?)(>)",
"<ol $2>" + StringUtils.repeat("\n\t<li style=\"visibility:hidden\" />", n - 1));
}
Upvotes: 3
Reputation: 1038
You can try this.
String input="<ol start=\"6\">"+
"<li>Element 1</li>"+
"<li>Element 2</li>"+
"<li>Element 3</li>"+
"<li>Element 4</li>"+
"<li>Element 5</li>"+
"<li>Element6</li>"+
"</ol>";
Matcher match= Pattern.compile("<ol .*start.*=.*\\\"(.*)\\\"\\s*>(.*)(</ol>)").matcher(input);
String resultString ="";
if(match.find()){
resultString =match.replaceAll("<ol>"+new String(new char[Integer.valueOf(match.group(1))-1]).replace("\0", "\n\t<li style=\"visibility:hidden\" />")+"$2$3");
}
Upvotes: 0
Reputation: 16498
Using Jsoup you can write something like:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
class JsoupTest {
public static void main(String[] args){
String html = "<ol start=\"3\">\n" +
" <li>Element 1</li>\n" +
" <li>Element 2</li>\n" +
" <li>Element 3</li>\n" +
"</ol>"
+ "<p>some other html elements</p>"
+ "<ol start=\"5\">\n" +
" <li>Element 1</li>\n" +
" <li>Element 2</li>\n" +
" <li>Element 3</li>\n" +
" <li>Element 4</li>\n" +
" <li>Element 5</li>\n" +
"</ol>";
Document doc = Jsoup.parse(html);
Elements ols = doc.select("ol");
for(Element ol :ols){
int start = Integer.parseInt(ol.attr("start"));
for(int i=0; i<start-1; i++){
ol.prependElement("li").attr("style", "visibility:hidden");
}
ol.attributes().remove("start");
System.out.println(ol);
}
}
}
Upvotes: 1
Reputation: 7
Please use java Matcher and Pattern to count the occurrence of li tag and use StringBuilder insert method to insert invisible elements.
Matcher m = Pattern.compile("<li>").matcher(s);
while(m.find()){
++count;
}
Upvotes: -2
Reputation: 7906
You cannot do this using regular expressions, or even if you find some hack to do this it's going to be a suboptimal solution..
The right way to do this is to use an HTML parsing library (e.g. Jsoup) and then add the <li>
tags as children to the <ol>
, specifically using the Element#prepend method. (With Jsoup you can also read the start
attribute value in order to compute how many elements to add)
Upvotes: 4