Reputation: 181
I am using TagSoup with java to extract some data , but certain XPATH are not working , I just get empty results
FileReader frInHtml = new FileReader("doc.html");
BufferedReader brInHtml = new BufferedReader(frInHtml);
SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);
// This is working
XPath xpath = XPath.newInstance("/ns:html[1]/ns:body/ns:div[@class='content']/ns:table/ns:tr/ns:td/ns:h1");
// All 3 lines below didn't work , tried them 1 at a time
XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[7]/ns:table/ns:tbody/ns:tr/ns:td/ns:h1");
XPath xpath = XPath.newInstance("//html//body//div[7]//table//tbody//tr//td//h1");
XPath xpath = XPath.newInstance("/html/body/div[7]/table/tbody/tr/td/h1");
xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");
Upvotes: 0
Views: 676
Reputation: 163595
To debug this you will need to look at the "equivalent XML" produced by TagSoup. And for us to help you, you will need to show us the equivalent XML.
Upvotes: 1