Reputation: 38705
I have a table that contain follow logic.
<tr class=hiderow><td class=packagename>...</td></tr>
-> this row will not be visible.So the table might contain 100 rows, but if there are 20 rows contain class=hiderow
, then the user can only see 80 rows on the page. I want to retrieve the name of those 80 rows (not 100). So I need to parse out data that does not contain class=hiderow
. I know how to obtain every name using jsoup
, I also see there is in the documentation
:not(selector) elements that do not match the selector.
but i am not sure how to use it. Please help.
EDIT I have figure out how to do it. Please let me know if there is better way.
EDIT2 Please use solution below from BalusC. It's much cleaner.
public void obtainPackageName(String urlLink) throws IOException{
List<String> pdfList = new ArrayList<String>();
URL url = new URL(urlLink);
Document doc = Jsoup.parse(url, 3000);
Element table = doc.select("table[id=mastertableid]").first();
Iterator<Element> rowIter = table.select("tr").iterator();
while(rowIter.hasNext()){
Element row = rowIter.next();
if(!row.className().contains("hiderow")){
Element packageName = row.select("td[class=packagename]").first();
if(packageName != null){
pdfList.add(packageName.text());
}
}
}
}
Upvotes: 3
Views: 6007
Reputation: 1108722
You need to apply the :not()
on the element of interest (which is tr
in your case) and then pass the element-relative CSS selector into it on which the element should not match (which is .hiderow
in your case).
So, this should do:
Document document = Jsoup.connect(urlLink).get();
Elements packagenames = document.select("#mastertableid tr:not(.hiderow) td.packagename");
List<String> pdfList = new ArrayList<String>();
for (Element packagename : packagenames) {
pdfList.add(packagename.text());
}
Upvotes: 8