Ank K
Ank K

Reputation: 11

ROME API Parse Image URL in CDATA from RSS Feed

Rome API does not parse the image URL if the URL is given within the CDATA section. For example, http://www.espn.com/espn/rss/espnu/news this feed has

    <image>
    <![CDATA[
   URL of the image
    ]]>
    </image>

Within the SyndFeed resulting from SyndFeedInput, I have checked the foreignMarkups, enclosures, DCModules.

value of other elements, such as Description and Title are also given within the CDATA, and Rome API is able to parse these values.

code snippet

XmlReader xmlReader = null;
        try {
            xmlReader = new XmlReader(new URL("http://www.espn.com/espn/rss/espnu/news"));
            SyndFeedInput input = new SyndFeedInput();
            SyndFeed feed = input.build(xmlReader);
        } catch (Exception e) {
            e.printStackTrace();
        } 

Upvotes: 0

Views: 762

Answers (1)

Ank K
Ank K

Reputation: 11

I looked into the API in more details. The API provides plugins to override the parsing https://rometools.github.io/rome/RssAndAtOMUtilitiEsROMEV0.5AndAboveTutorialsAndArticles/RssAndAtOMUtilitiEsROMEPluginsMechanism.html

I wrote a class that extends RSS20Parser implements WireFeedParser and override the parseItem method

    @Override
        public Item parseItem(Element rssRoot, Element eItem, Locale locale) {
            Item item =  super.parseItem(rssRoot, eItem, locale);

            Element imageElement = eItem.getChild("image", getRSSNamespace());
            if (imageElement != null) {
                String imageUrl = imageElement.getText();

                Element urlElement = imageElement.getChild("url");

if(urlElement != null)
{
imageUrl = urlElement.getText();
}    
                Enclosure e = new Enclosure();
                e.setType("image");
                e.setUrl(imageUrl);
                item.getEnclosures().add(e);
            }

            return item;
        }

Now in SyndFeed, access the enclosures list and you will be able to find the image URL

List<SyndEntry> entries = feed.getEntries();
        for (SyndEntry entry : entries) {
...
...
    List<SyndEnclosure> enclosures = entry.getEnclosures();
                if(enclosures!=null) {
                    for(SyndEnclosure enclosure : enclosures) {
                        if(enclosure.getType()!=null && enclosure.getType().equals("image")){
                            System.out.println("image URL : "+enclosure.getUrl());
                        }
                    }
                }
}

and create a rome.properties file which is accessible in classpath with following entry

WireFeedParser.classes=your.package.name.CustomRomeRssParser

Upvotes: 0

Related Questions