Reputation: 3915
First of all this is not a programming related question and I'm really very sorry for posting it on here but I really need to know about it. I am building a rss reader app and I just wanted to know that where is information about featured images into any rss xml. Following is an excerpt from the xml file I get from CNN rss but where is information about images.
<item><title>Ice melt speeding up, study finds</title><guid>http://edition.cnn.com/2012/11/29/world/europe/climate-ice-sheets/index.html</guid><link>http://edition.cnn.com/2012/11/29/world/europe/climate-ice-sheets/index.html?eref=edition</link><description>Two decades of satellite readings back up what dramatic pictures have suggested in recent years: The mile-thick ice sheets that cover Greenland and most of Antarctica are melting at a faster rate in a warming world.</description><pubDate>Thu, 27 Jun 2013 08:59:27 EDT</pubDate></item>
<item><title>Twins 'stolen' from hospital rescued</title><guid>http://edition.cnn.com/2013/08/10/world/asia/china-baby-trafficking-twin-girls/index.html</guid><link>http://edition.cnn.com/2013/08/10/world/asia/china-baby-trafficking-twin-girls/index.html?eref=edition</link><description>Police in China have rescued twin baby girls allegedly sold by a maternity doctor, bringing the number of infants recovered from the suspected trafficking ring to three, state media reported. </description><pubDate>Sun, 11 Aug 2013 19:31:43 EDT</pubDate></item>
<item><title>HK makes $5M ivory bust</title><guid>http://edition.cnn.com/2013/08/08/world/hong-kong-ivory-tusk-seizure-august/index.html</guid><link>http://edition.cnn.com/2013/08/08/world/hong-kong-ivory-tusk-seizure-august/index.html?eref=edition</link><description>In one of the biggest busts of its kind in Hong Kong, customs authorities this week seized more than 1,100 ivory tusks, 13 rhino horns and five leopard pelts. The haul, found in a container shipped from Nigeria, is valued at more than $5.3 million.</description><pubDate>Sun, 11 Aug 2013 19:31:58 EDT</pubDate></item>
<item><title>Human transmission of H7N9</title><guid>http://edition.cnn.com/2013/08/07/health/china-bird-flu-transmission/index.html</guid><link>http://edition.cnn.com/2013/08/07/health/china-bird-flu-transmission/index.html?eref=edition</link><description>Until this week, no cases of human-to-human transmission of the deadly bird flu virus that broke out in China this year had been reported.</description><pubDate>Wed, 07 Aug 2013 22:16:18 EDT</pubDate></item>
<item><title>Doctor accused of taking newborns</title><guid>http://edition.cnn.com/2013/08/07/world/asia/china-baby-trafficking-shaanxi/index.html</guid><link>http://edition.cnn.com/2013/08/07/world/asia/china-baby-trafficking-shaanxi/index.html?eref=edition</link><description>Chinese health authorities have promised an overhaul in hospitals across the country following the arrest of an obstetrician for allegedly selling newborns to human traffickers, state media reports.</description><pubDate>Wed, 07 Aug 2013 03:38:22 EDT</pubDate></item>
<item><title>Chinese tourists targeted in Paris</title><guid>http://edition.cnn.com/2013/08/07/travel/chinese-tourists-paris-pickpockets/index.html</guid><link>http://edition.cnn.com/2013/08/07/travel/chinese-tourists-paris-pickpockets/index.html?eref=edition</link><description>It's known as the City of Light, but it risks becoming known as the city of the light-fingered.</description><pubDate>Wed, 07 Aug 2013 22:16:33 EDT</pubDate></item>
Do I have to write a web-crawler which follows the feed links and scrap images and text from the destination page? I just need to know how professional rss readers work.
FYI,I've googled a lot about this but was unsuccessful so that's why I am asking you people.Please help.
Upvotes: 1
Views: 911
Reputation: 2149
Since the information about the images are not stored in the xml, they have to get crawled somehow.
Do I have to write a web-crawler which follows the feed links and scrap images and text from the destination page?
Yes. For the cnn stories you linked, the title image is always inside the div class "cnn_stryimg640captioned".
You have to handle videos and image galleries (as headers) seperatly.
I just need to know how professional rss readers work.
Professional rss readers have some fancy algorithms which help them to determine which images are the relevant ones for an article. They don't always get it right, tough.
Upvotes: 1