Parse href out of html document and evaluating by xpath returns null pointer exception

Question

I am going to parse URLs in specific location of one website. For this purpose I wrote a simple program in Java. But this program returns null pointer exception. It seems that getNameItem("href") returns null. I am suspicious about wrong way of using getNameItem to extract URLs inside "href" tag.

DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
org.w3c.dom.Document doc = b.parse(new FileInputStream("clean.html"));

//Evaluate XPath against Document itself
javax.xml.xpath.XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList)xPath.evaluate(".//*[@class='r_news_box']",
        doc.getDocumentElement(), XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); ++i) {
    Element e = (Element) nodes.item(i);
    System.out.println(e.getAttributes().getNamedItem("href").getTextContent());
}

P.S: here is one of the nodes that should be selected by this xpath:

har07 · Accepted Answer

Possibly because not all nodes selected has href attribute. You may want to change your XPath to make sure only elements having href attribute are returned :

.//*[@class='r_news_box' and @href]

UPDATE :

According to your update, href is the attribute of node within an element having class attribute equals r_news_box, so here is corrected XPath :

.//*[@class='r_news_box']/a[@href]

Parse href out of html document and evaluating by xpath returns null pointer exception

Answers (2)

Related Questions