Thang Pham
Thang Pham

Reputation: 38705

JSoup: Retrieve element that does not contain a specific attribute

I have a table that contain follow logic.

  1. The table display list of names
  2. For every row that contain <tr class=hiderow><td class=packagename>...</td></tr> -> this row will not be visible.

So the table might contain 100 rows, but if there are 20 rows contain class=hiderow, then the user can only see 80 rows on the page. I want to retrieve the name of those 80 rows (not 100). So I need to parse out data that does not contain class=hiderow. I know how to obtain every name using jsoup, I also see there is in the documentation :not(selector) elements that do not match the selector. but i am not sure how to use it. Please help.

EDIT I have figure out how to do it. Please let me know if there is better way.
EDIT2 Please use solution below from BalusC. It's much cleaner.

public void obtainPackageName(String urlLink) throws IOException{
    List<String> pdfList = new ArrayList<String>();
    URL url = new URL(urlLink);
    Document doc = Jsoup.parse(url, 3000);
    Element table = doc.select("table[id=mastertableid]").first();
    Iterator<Element> rowIter = table.select("tr").iterator();
    while(rowIter.hasNext()){
        Element row = rowIter.next();
        if(!row.className().contains("hiderow")){
            Element packageName = row.select("td[class=packagename]").first();
            if(packageName != null){
                pdfList.add(packageName.text());
            }

        }
    }
}

Upvotes: 3

Views: 6007

Answers (1)

BalusC
BalusC

Reputation: 1108722

You need to apply the :not() on the element of interest (which is tr in your case) and then pass the element-relative CSS selector into it on which the element should not match (which is .hiderow in your case).

So, this should do:

Document document = Jsoup.connect(urlLink).get();
Elements packagenames = document.select("#mastertableid tr:not(.hiderow) td.packagename");
List<String> pdfList = new ArrayList<String>();

for (Element packagename : packagenames) {
    pdfList.add(packagename.text()); 
}

Upvotes: 8

Related Questions