Reputation: 15251
Which library would allow me to evaluate xpath on an html string ?
I have tried using the javax package but this seems to fail:
String docroot = "<div><i>items <b>sold</b></i></div>";
XPath xxpath = XPathFactory.newInstance().newXPath();
InputSource docroot = new InputSource(new StringReader(subelements));
String result = (String) xxpath.evaluate("//b", docroot, XPathConstants.STRING);
Upvotes: 1
Views: 2440
Reputation: 148977
Try the following instead, there were some errors in your code sample:
import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String docroot = "<div><i>items <b>sold</b></i></div>";
XPath xxpath = XPathFactory.newInstance().newXPath();
InputSource inputSource = new InputSource(new StringReader(docroot));
String result = (String) xxpath.evaluate("//b", inputSource, XPathConstants.STRING);
System.out.println(result);
}
}
Upvotes: 3
Reputation: 9981
You want a Java HTML parsing library that can produce a valid XML Document object. Based on this unscientific library comparison it appears HTML Cleaner would do the trick.
From the HTML Cleaner site:
Although the main motive was to prepare ordinary HTML for XML processing with XPath, XQuery and XSLT, structured data produced by HtmlCleaner may be consumed and handled in menu other ways.
This documentation link provides an example of how to read in an HTML string, execute an XPath query, and work with the results.
Upvotes: 2
Reputation: 75346
You need a parser that is lenient enough to parse in HTML as XML, and those are rare. I believe TagSoup - http://java-source.net/open-source/html-parsers/tagsoup - can do it, but it's been a long time since I had a look at it.
(more here: http://java-source.net/open-source/html-parsers/tagsoup)
Is there any reason you cannot just provide a XHTML snippet?
Upvotes: 2