The Learner
The Learner

Reputation: 3927

getting the hyperlink from website

I am using Jsoup. I do a get document= connect.get(); and get the html page.

now I write that to a text(string).

I have users who populate these pages. I know each user name . These pages have the username. I am able to do a string.contains("username") to check if the user is present or not.

Now my issue is: I have users with there names in

Tables

ordered lists

unordered lists

in Body

But in all these cases they have in format as:Example

<li><a href="http://university.xxx.students.com/grade9/john/117429">2012 academic record</a></li>

some are in table and all..

In the example I know the student name = john. how can I get all the urls?

==

Upvotes: 0

Views: 91

Answers (2)

Sreenath S
Sreenath S

Reputation: 1269

How about this:

    Document doc = Jsoup.connect(url).get();
    Elements links = doc.select("a[href]");

    for (Element link : links) {
        if(link.attr("abs:href").contains(studentName) || link.text().contains(studentName)){
            studentLinkList.add(link.attr("abs:href"));
        }
    }

Upvotes: 0

ollo
ollo

Reputation: 25380

You can use regex for this:

Elements elements = document.select("[href~=(?is)http://university\\.xxx\\.students\\.com/grade9/(.+?)/[0-9]+?]")

more abstract: document.select("a[href~=regex]")

if you already know the name you can replace (.+?), eg.:

Elements elements = document.select("[href~=(?is)http://university\\.xxx\\.students\\.com/grade9/" + name + "/[0-9]+?]")

Upvotes: 1

Related Questions