Reputation: 3927
I am using Jsoup.
I do a get document= connect.get();
and get the html page.
now I write that to a text(string).
I have users who populate these pages. I know each user name . These pages have the username. I am able to do a string.contains("username") to check if the user is present or not.
Now my issue is: I have users with there names in
Tables
ordered lists
unordered lists
in Body
But in all these cases they have in format as:Example
<li><a href="http://university.xxx.students.com/grade9/john/117429">2012 academic record</a></li>
some are in table and all..
In the example I know the student name = john. how can I get all the urls?
==
Upvotes: 0
Views: 91
Reputation: 1269
How about this:
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
if(link.attr("abs:href").contains(studentName) || link.text().contains(studentName)){
studentLinkList.add(link.attr("abs:href"));
}
}
Upvotes: 0
Reputation: 25380
You can use regex for this:
Elements elements = document.select("[href~=(?is)http://university\\.xxx\\.students\\.com/grade9/(.+?)/[0-9]+?]")
more abstract: document.select("a[href~=regex]")
if you already know the name you can replace (.+?)
, eg.:
Elements elements = document.select("[href~=(?is)http://university\\.xxx\\.students\\.com/grade9/" + name + "/[0-9]+?]")
Upvotes: 1