JSoup: Retrieve element that does not contain a specific attribute

Question

I have a table that contain follow logic.

The table display list of names
For every row that contain ... -> this row will not be visible.

So the table might contain 100 rows, but if there are 20 rows contain class=hiderow, then the user can only see 80 rows on the page. I want to retrieve the name of those 80 rows (not 100). So I need to parse out data that does not contain class=hiderow. I know how to obtain every name using jsoup, I also see there is in the documentation :not(selector) elements that do not match the selector. but i am not sure how to use it. Please help.

EDIT I have figure out how to do it. Please let me know if there is better way.
EDIT2 Please use solution below from BalusC. It's much cleaner.

public void obtainPackageName(String urlLink) throws IOException{
    List pdfList = new ArrayList();
    URL url = new URL(urlLink);
    Document doc = Jsoup.parse(url, 3000);
    Element table = doc.select("table[id=mastertableid]").first();
    Iterator rowIter = table.select("tr").iterator();
    while(rowIter.hasNext()){
        Element row = rowIter.next();
        if(!row.className().contains("hiderow")){
            Element packageName = row.select("td[class=packagename]").first();
            if(packageName != null){
                pdfList.add(packageName.text());
            }

        }
    }
}

BalusC · Accepted Answer

You need to apply the :not() on the element of interest (which is tr in your case) and then pass the element-relative CSS selector into it on which the element should not match (which is .hiderow in your case).

So, this should do:

Document document = Jsoup.connect(urlLink).get();
Elements packagenames = document.select("#mastertableid tr:not(.hiderow) td.packagename");
List pdfList = new ArrayList();

for (Element packagename : packagenames) {
    pdfList.add(packagename.text()); 
}

JSoup: Retrieve element that does not contain a specific attribute

Answers (1)

Related Questions