Reputation: 2289
Say, I have a String:
String someString = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";
In this String the position of the "Content" is known.
Now, I want to turn the most inner divs into span tags. So what I want to do:
someString.replacePreviousOccurrence(someString.indexOf("Content"), "<div ", "<span>");
someString.replaceNextOccurrence(someString.indexOf("Content"), "</div>", "</span>");
Is there something in Java to do this? Or just to get the index of a previous and next occurrence of a substring from a specified index?
Edit: forgot to specify the divs have unknown tags (may have classes and stuff) and there may be stuff in between (like the tag in the example).
Upvotes: 0
Views: 251
Reputation: 422
You can definitely do this with regex, though it may not be the most elegant solution. Here is the pattern you might use: <div>(?!<div>).*(?<!<\/div>)<\/div>
This works by using negative lookahead and negative lookbehind. Negative lookahead here: (?!<div>)
says find this pattern where this is not followed by "<div>"
and the negative lookbehind here: (?<!<\/div>)
says find this pattern where it is not preceded by </div>
So the pattern broken down:
<div> //matches <div>
(?!<div>) //that isn't followed by <div>
.* //followed by any character any number of times
(?<!<\/div>) // Where the next match isn't preceded by <div>
<\/div> //matches </div>
So for this problem you can do something like the following:
String str = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";
Pattern p = "<div>(?!<div>).*(?<!<\/div>)<\/div>";
Matcher m = p.matcher(str);
String output = m.replaceAll("<div>", "<span>").replaceAll("</div>", "</span>");
Upvotes: 1
Reputation: 480
You could use the built-in functionality for working with xml.
This is however, sadly, very verbose -but works.
public static void replaceDivWithSpamByText() throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, TransformerException {
String html = "<html><body><div><div><div>Content</div></div></div></body></html>";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)));
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
Node contentNode = (Node) xpath.evaluate(".//div[text() = 'Content']", doc, XPathConstants.NODE);
doc.renameNode(contentNode, null, "span");
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
System.out.println(writer.toString());
}
Note that in this example I use Xpath to select the node by text(".//div[text() = 'Content']"), selecting by id, class, or other attributes is very easy. But writing a generic class to handle this could be a good idea if you're doing this kind of replacements a lot.
Upvotes: 1