Android: How to download RSS when a website contains: link rel="alternate" type="application/rss+xml"

Question

I am making a RSS related app.
I want to be able to download RSS(xml) given only website URL that contains:

link rel="alternate" type="application/rss+xml"

For example, http://www.engaget.com source contains:

I am assuming if I open this site as RSS application,
it will re-direct me to http://www.engadget.com/rss.xml page.

My code to download xml is following:

private boolean downloadXml(String url, String filename) {
        try {
            URL   urlxml = new URL(url);
            URLConnection ucon = urlxml.openConnection();
            ucon.setConnectTimeout(4000);
            ucon.setReadTimeout(4000);
            InputStream is = ucon.getInputStream();
            BufferedInputStream bis = new BufferedInputStream(is, 128);
            FileOutputStream fOut = openFileOutput(filename + ".xml", Context.MODE_WORLD_READABLE | Context.MODE_WORLD_WRITEABLE);
            OutputStreamWriter osw = new OutputStreamWriter(fOut);
            int current = 0;
            while ((current = bis.read()) != -1) {
                osw.write((byte) current);
            }
            osw.flush();
            osw.close();

        } catch (Exception e) {
            return false;
        }
        return true;
    }

without me knowing 'http://www.engadget.com/rss.xml' url, how can I download RSS when I input 'http://www.engadget.com"?

creemama · Accepted Answer

To accomplish this, you need to:

Detect whether the URL points to an HTML file. See the isHtml method in the code below.
If the URL points to an HTML file, extract an RSS URL from it. See the extractRssUrl method in the code below.

The following code is a modified version of the code you pasted in your question. For I/O, I used Apache Commons IO for the useful IOUtils and FileUtils classes. IOUtils.toString is used to convert an input stream to a string, as recommended in the article "In Java, how do I read/convert an InputStream to a String?"

extractRssUrl uses regular expressions to parse HTML, even though it is highly frowned upon. (See the rant in "RegEx match open tags except XHTML self-contained tags.") With this in mind, let extractRssUrl be a starting point. The regular expression in extractRssUrl is rudimentary and doesn't cover all cases.

Note that a call to isRss(str) is commented out. If you want to do RSS detection, see "How to detect if a page is an RSS or ATOM feed."

private boolean downloadXml(String url, String filename) {
    InputStream is = null;
    try {
        URL urlxml = new URL(url);
        URLConnection ucon = urlxml.openConnection();
        ucon.setConnectTimeout(4000);
        ucon.setReadTimeout(4000);
        is = ucon.getInputStream();
        String str = IOUtils.toString(is, "UTF-8");
        if (isHtml(str)) {
            String rssURL = extractRssUrl(str);
            if (rssURL != null && !url.equals(rssURL)) {
                return downloadXml(rssURL, filename + ".xml");
            }
        } else { // if (isRss(str)) {
            // For now, we'll assume that we're an RSS feed at this point
            FileUtils.write(new File(filename), str);
            return true;
        }
    } catch (Exception e) {
        // do nothing
    } finally {
        IOUtils.closeQuietly(is);
    }
    return false;
}

private boolean isHtml(String str) {
    Pattern pattern = Pattern.compile("", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(str);
    if (matcher.find()) {
        for (int i = 1; i <= matcher.groupCount(); i++) {
            if (matcher.group(i) != null) {
                return matcher.group(i);
            }
        }
    }
    return null;
}

The above code works with your Engadget example:

obj.downloadXml("http://www.engadget.com/", "rss");

Android: How to download RSS when a website contains: link rel="alternate" type="application/rss+xml"

Answers (2)

Related Questions

Android: How to download RSS when a website contains: link rel=&quot;alternate&quot; type=&quot;application/rss+xml&quot;

Answers (2)

Related Questions

Android: How to download RSS when a website contains: link rel="alternate" type="application/rss+xml"