Reputation: 1879
I am going to parse URLs in specific location of one website. For this purpose I wrote a simple program in Java. But this program returns null pointer exception. It seems that getNameItem("href")
returns null. I am suspicious about wrong way of using getNameItem to extract URLs inside "href" tag.
DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
org.w3c.dom.Document doc = b.parse(new FileInputStream("clean.html"));
//Evaluate XPath against Document itself
javax.xml.xpath.XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList)xPath.evaluate(".//*[@class='r_news_box']",
doc.getDocumentElement(), XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); ++i) {
Element e = (Element) nodes.item(i);
System.out.println(e.getAttributes().getNamedItem("href").getTextContent());
}
P.S: here is one of the nodes that should be selected by this xpath:
<div class="r_news_box">
<a class="picLink" target="_blank" href="/fa/news/427583/test">
<img class="r_news_img" width="50" height="65" src="/files/fa/news/1393/5/29/411217_553.jpg" alt="test"/>
</a>
Upvotes: 0
Views: 283
Reputation: 89325
Possibly because not all nodes selected has href
attribute. You may want to change your XPath to make sure only elements having href
attribute are returned :
.//*[@class='r_news_box' and @href]
UPDATE :
According to your update, href
is the attribute of <a>
node within an element having class
attribute equals r_news_box
, so here is corrected XPath :
.//*[@class='r_news_box']/a[@href]
Upvotes: 1