Reputation: 15251

Java: how to parse an html string for XML tool consume?

Which library would allow me to evaluate xpath on an html string ?

I have tried using the javax package but this seems to fail:

String docroot = "<div><i>items <b>sold</b></i></div>";
XPath xxpath = XPathFactory.newInstance().newXPath();
InputSource docroot = new InputSource(new StringReader(subelements)); 
String result = (String) xxpath.evaluate("//b", docroot, XPathConstants.STRING);

Upvotes: 1

Answers (3)

bdoughan

Reputation: 149047

Try the following instead, there were some errors in your code sample:

import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        String docroot = "<div><i>items <b>sold</b></i></div>";
        XPath xxpath = XPathFactory.newInstance().newXPath();
        InputSource inputSource = new InputSource(new StringReader(docroot)); 
        String result = (String) xxpath.evaluate("//b", inputSource, XPathConstants.STRING);
        System.out.println(result);
    }

}

Upvotes: 3

orangepips

Reputation: 9971

You want a Java HTML parsing library that can produce a valid XML Document object. Based on this unscientific library comparison it appears HTML Cleaner would do the trick.

From the HTML Cleaner site:

Although the main motive was to prepare ordinary HTML for XML processing with XPath, XQuery and XSLT, structured data produced by HtmlCleaner may be consumed and handled in menu other ways.

This documentation link provides an example of how to read in an HTML string, execute an XPath query, and work with the results.

Upvotes: 2

Thorbjørn Ravn Andersen

Reputation: 75426

You need a parser that is lenient enough to parse in HTML as XML, and those are rare. I believe TagSoup - http://java-source.net/open-source/html-parsers/tagsoup - can do it, but it's been a long time since I had a look at it.

(more here: http://java-source.net/open-source/html-parsers/tagsoup)

Is there any reason you cannot just provide a XHTML snippet?

Upvotes: 2

Java: how to parse an html string for XML tool consume?

Answers (3)

Related Questions