pallavi Kulkarni
pallavi Kulkarni

Reputation: 131

How to obtain field names of RSS feed(xml file) in python dynamically using feedparser?

I have used feedparser library in python to read rss feeds from particlar URL. the feeds are received in 'fee' variable by using following line of code:

fee =  feedparser.parse('http://www.indiatimes.com/r/python/.rss')

fee contains feed in list of list format. The format and the data we get in this is complex and not fixed.

I want to obtain names of fields(keys) of this RSS feed dynamically. How to do that?

some field names are fixed such as link, date etc. But I need names of all fields in my code.

Upvotes: 0

Views: 1922

Answers (3)

Roshan Bagdiya
Roshan Bagdiya

Reputation: 2178

Use below code it will give you all keys name,

import feedparser
feeds_all = feedparser.parse(URL)
feed_all_keys = feeds_all.keys()
feed_keys = feeds_all.feed.keys()
entries_keys = feeds_all.entries.keys()
  1. feed_all_keys holds all keys
  2. feed_keys holds keys related to feed
  3. entries_keys holds keys related to entries(items)

Upvotes: 0

pallavi Kulkarni
pallavi Kulkarni

Reputation: 131

feeds_all =  feedparser.parse('http://www.indiatimes.com/r/python/.rss')

I am not sure what kind of json it is, but the functions .keys() and .values() work fine on it. What I did is, for dynamically getting names of keys that are previously unknown (above answer gives static keys and it's values, you need to know the key names in advance), fee.keys() and it worked!

So, the answer is in the following lines: channel_keys = feeds_all.keys() and feed_keys = feeds_all.feed.keys(), for getting value of those keys, feed_values = feeds_all.feed.values()....

Upvotes: 0

BoreBoar
BoreBoar

Reputation: 2749

First of all, the link you're going through has a 404 error. So you're not going to get any rss from that link.

Secondly, an RSS link ends with a .rss file most of the times.

ex: http://timesofindia.feedsportal.com/c/33039/f/533916/index.rss

Once you get an actual working RSS link, all you have to do is this:

fee = feedparser.parse('http://timesofindia.feedsportal.com/c/33039/f/533916/index.rss')
for feed in fee.entries:
    print feed.title
    print feed.link

What I wrote above was for the getting the item elements.

Let me provide you with a better example.

import feedparser
rss_document = """
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Sample Feed</title>
<description>For documentation &lt;em&gt;only&lt;/em&gt;</description>
<link>http://example.org/</link>
<pubDate>Sat, 07 Sep 2002 00:00:01 GMT</pubDate>
<!-- other elements omitted from this example -->
<item>
<title>First entry title</title>
<link>http://example.org/entry/3</link>
<description>Watch out for &lt;span style="background-image:
url(javascript:window.location='http://example.org/')"&gt;nasty
tricks&lt;/span&gt;</description>
<pubDate>Thu, 05 Sep 2002 00:00:01 GMT</pubDate>
<guid>http://example.org/entry/3</guid>
<!-- other elements omitted from this example -->
</item>
</channel>
</rss>
"""
rss = feedparser.parse(rss_document)

# Channel Details

print "-----Channel Details-----"

print rss.feed.title
print rss.feed.description
print rss.feed.link

# Item Details

print "-----Item Details-----"
for fee in rss.entries:
    print fee.title
    print fee.summary,
    print fee.link

Upvotes: 1

Related Questions