Reputation: 439

Android using JSOUP for HTML

I am just completely lost and confused when using JSOUP to parse this html document...

I dont mean to just ask for straight up code but if someone has the time or can get me started that would be great...

Here is the document: http://radar.weather.gov/ridge/RadarImg/N0R/ILN/

If you view the source I am trying to fetch these lines:

<tr><td valign="top"><img src="/icons/image2.gif" alt="[IMG]"></td><td><a href="ILN_20140112_0021_N0R.gif">ILN_20140112_0021_N0R.gif</a></td><td align="right">12-Jan-2014 00:23  </td><td align="right">2.2K</td><td>&nbsp;</td></tr>

As you notice there are many of these... I need the value in

<a href=

I also need that value in the first ten of those lines...

As i said if anyone has the time to help me out, it would be greatly appreciated!

Upvotes: 1

Answers (3)

chariot423

Reputation: 1243

Edit: Refer to @ashatte's solution instead.

Document doc = Jsoup.parse
                 (new URL("http://radar.weather.gov/ridge/RadarImg/N0R/ILN/"),
                    3000); 
          //Or whatever your link is; 3000 is timeout

            int ignoreCount = 0; 
            //using a counter to ignore top 2 lines 
            for (Element item : doc.select("tr")) {
            // Selects the <tr> elements so item is a single <tr>
                if (a > 1) {
                    Element link = item.select("a").first(); 
                         // selects first <a> element
                    if (link != null && link.hasAttr("href"))
                        String href = link.attr("href"));
                         // fetches href attribute from the selected <a> 
                }
                a++;
            }

This is just a way to do it among many. I strongly suggest you read up the JSOUP cookbook

Upvotes: 0

ashatte

Reputation: 5538

First you need to store the contents of the HTML into a Document (explained more here):

String url = "http://radar.weather.gov/ridge/RadarImg/N0R/ILN/";    
Document doc = Jsoup.connect(url).get();

Next select the Elements from the Document that you want (see here). In the following line, it will select all <a> elements with a href attribute that contains the String "gif":

Elements links = doc.select("a[href]:contains(gif)");

Then to print out the value from the first ten, you could just use a loop. The attr() method allows you to extract only the value of a certain attribute, rather than the complete HTML or its text:

for (int i=0;i<10;i++) {
    System.out.println(links.get(i).attr("href"));
}

The output is:

ILN_20140112_0221_N0R.gif
ILN_20140112_0227_N0R.gif
ILN_20140112_0232_N0R.gif
ILN_20140112_0237_N0R.gif
ILN_20140112_0242_N0R.gif
ILN_20140112_0248_N0R.gif
ILN_20140112_0253_N0R.gif
ILN_20140112_0258_N0R.gif
ILN_20140112_0303_N0R.gif
ILN_20140112_0308_N0R.gif

This is essentially the basic methodology for most of the parsing you will do in Jsoup. You should have a go at extracting some other Elements from the page (use this page for reference).

Upvotes: 2

Adnan

Reputation: 8729

Try this

String TestUrl = "<tr><td><img src='/icons/image2.gif' alt='[IMG]'></td><td><a href='ILN_20140112_0021_N0R.gif'>ILN_20140112_0021_N0R.gif</a></td><td align='right'>12-Jan-2014 00:23</td><td align='right'>2.2K</td><td>&nbsp;</td></tr>";
Document doc =  Jsoup.parse(TestUrl);
Element link = doc.select("a").first();
/**
* value will be "ILN_20140112_0021_N0R.gif"
*/
String value = link.text();

Upvotes: 0

Android using JSOUP for HTML

Answers (3)

Related Questions