Faber
Faber

Reputation: 380

Cannot extract data from an XML

Im using getElementBytag method to extract data from the following an XML document(Yahoo finance news api http://finance.yahoo.com/rss/topfinstories)


Im using the following code . It gets the new items and the title's no problem using the getelementsBytag method but for some reason wont pick up the link when searched by tag. It only picks up the closing tag for the link element. Is it a problem with the XML document or a problem with jsoup?

import java.io.IOException;         
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;   

class GetNewsXML {
    /**
     * @param args
     */
    /**
     * @param args
     */
    public static void main(String args[]){
        Document doc = null;
        String con = "http://finance.yahoo.com/rss/topfinstories";
        try {
            doc = Jsoup.connect(con).get();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        Elements collection = doc.getElementsByTag("item");// Gets each news item
        for (Element c: collection){
            System.out.println(c.getElementsByTag("title"));
        }
        for (Element c: collection){
            System.out.println(c.getElementsByTag("link"));
        }
    }

Upvotes: 0

Views: 1323

Answers (1)

ollo
ollo

Reputation: 25370

You get <link /> http://...; the link is put after the link-tag as a textnode.

But this is not a problem:

final String url = "http://finance.yahoo.com/rss/topfinstories";

Document doc = Jsoup.connect(url).get();


for( Element item : doc.select("item") )
{
    final String title = item.select("title").first().text();
    final String description = item.select("description").first().text();
    final String link = item.select("link").first().nextSibling().toString();

    System.out.println(title);
    System.out.println(description);
    System.out.println(link);
    System.out.println("");
}

Explanation:

item.select("link")  // Select the 'link' element of the item
    .first()         // Retrieve the first Element found (since there's only one)
    .nextSibling()   // Get the next Sibling after the one found; its the TextNode with the real URL
    .toString()      // Get it as a String

With your link this example prints all elements like this:

Tax Day Freebies and Deals
You made it through tax season. Reward yourself by taking advantage of some special deals on April 15.
http://us.rd.yahoo.com/finance/news/rss/story/SIG=14eetvku9/*http%3A//us.rd.yahoo.com/finance/news/topfinstories/SIG=12btdp321/*http%3A//finance.yahoo.com/news/tax-day-freebies-and-deals-133544366.html?l=1

(...)

Upvotes: 1

Related Questions