Jesse Schultz
Jesse Schultz

Reputation: 115

Jsoup getting contents of href

I am working on a web scraper using Jsoup and want to pull a link out of a table.

This is what I'm looking at:

<ul class="inline-list indent>
    <li>
        ::marker
        <a href="www.linkhere.com" title="Some Text">Some Other Text</a>
        (Date & Time Stamp)
    </li>   

I want www.linkhere.com and Some Other Text. I have already figured out how to get Some Other Text, but I can't get www.linkhere.com.

This is what I tried:

Document results = Jsoup.connect(url).get();
tTable = ("li:nth-of-type(1)");

Element row : results.select("ul.indent.inline-list:nth-of-type(1)")
Element link = results.select("ul.indent.inline-list:nth-of-type(1) > a").first();

tName = row.select(tTable).text();
articleLink = link.attr("href");

System.out.println(tName);
System.out.println(articleLink);

This gives me the error:

NullPointerException: Cannot invoke "org.jsoup.nodes.Element.attr(String)" because "llink" is null

Upvotes: 0

Views: 235

Answers (1)

Krystian G
Krystian G

Reputation: 2941

You're using such selector:

"ul.indent.inline-list:nth-of-type(1) > a"

The first part ul.indent.inline-list:nth-of-type(1) selects the first <ul> element. The second part > a expects that <a> will be direct child of <ul>. That will not match what you want because there's <li> element between them so the solution would be to use:

"ul.indent.inline-list:nth-of-type(1) > li > a"

or if your idea was to match the first <li> you have to use:

"ul.indent.inline-list > li:nth-of-type(1) > a"

Upvotes: 1

Related Questions