Reputation: 439
I am just completely lost and confused when using JSOUP to parse this html document...
I dont mean to just ask for straight up code but if someone has the time or can get me started that would be great...
Here is the document: http://radar.weather.gov/ridge/RadarImg/N0R/ILN/
If you view the source I am trying to fetch these lines:
<tr><td valign="top"><img src="/icons/image2.gif" alt="[IMG]"></td><td><a href="ILN_20140112_0021_N0R.gif">ILN_20140112_0021_N0R.gif</a></td><td align="right">12-Jan-2014 00:23 </td><td align="right">2.2K</td><td> </td></tr>
As you notice there are many of these... I need the value in
<a href=
I also need that value in the first ten of those lines...
As i said if anyone has the time to help me out, it would be greatly appreciated!
Upvotes: 1
Views: 169
Reputation: 1243
Edit: Refer to @ashatte's solution instead.
Document doc = Jsoup.parse
(new URL("http://radar.weather.gov/ridge/RadarImg/N0R/ILN/"),
3000);
//Or whatever your link is; 3000 is timeout
int ignoreCount = 0;
//using a counter to ignore top 2 lines
for (Element item : doc.select("tr")) {
// Selects the <tr> elements so item is a single <tr>
if (a > 1) {
Element link = item.select("a").first();
// selects first <a> element
if (link != null && link.hasAttr("href"))
String href = link.attr("href"));
// fetches href attribute from the selected <a>
}
a++;
}
This is just a way to do it among many. I strongly suggest you read up the JSOUP cookbook
Upvotes: 0
Reputation: 5538
First you need to store the contents of the HTML into a Document (explained more here):
String url = "http://radar.weather.gov/ridge/RadarImg/N0R/ILN/";
Document doc = Jsoup.connect(url).get();
Next select the Elements from the Document that you want (see here). In the following line, it will select all <a>
elements with a href
attribute that contains the String "gif"
:
Elements links = doc.select("a[href]:contains(gif)");
Then to print out the value from the first ten, you could just use a loop. The attr()
method allows you to extract only the value of a certain attribute, rather than the complete HTML or its text:
for (int i=0;i<10;i++) {
System.out.println(links.get(i).attr("href"));
}
The output is:
ILN_20140112_0221_N0R.gif
ILN_20140112_0227_N0R.gif
ILN_20140112_0232_N0R.gif
ILN_20140112_0237_N0R.gif
ILN_20140112_0242_N0R.gif
ILN_20140112_0248_N0R.gif
ILN_20140112_0253_N0R.gif
ILN_20140112_0258_N0R.gif
ILN_20140112_0303_N0R.gif
ILN_20140112_0308_N0R.gif
This is essentially the basic methodology for most of the parsing you will do in Jsoup. You should have a go at extracting some other Elements from the page (use this page for reference).
Upvotes: 2
Reputation: 8729
Try this
String TestUrl = "<tr><td><img src='/icons/image2.gif' alt='[IMG]'></td><td><a href='ILN_20140112_0021_N0R.gif'>ILN_20140112_0021_N0R.gif</a></td><td align='right'>12-Jan-2014 00:23</td><td align='right'>2.2K</td><td> </td></tr>";
Document doc = Jsoup.parse(TestUrl);
Element link = doc.select("a").first();
/**
* value will be "ILN_20140112_0021_N0R.gif"
*/
String value = link.text();
Upvotes: 0