a3nm
a3nm

Reputation: 8884

Retrieving raw XML for items with feedparser

I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way?

Thanks!

Upvotes: 2

Views: 1269

Answers (1)

Kurt McKee
Kurt McKee

Reputation: 1410

I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin (or edit a local copy of feedparser.py). The methods you'll want to modify are:

  • feedparser._FeedParserMixin.unknown_starttag
  • feedparser._FeedParserMixin.unknown_endtag

At the top of each method you can insert a callback to a routine of your own that will capture the elements and their attributes as they're encountered by feedparser.

Upvotes: 2

Related Questions