Reputation: 5
I'm trying to read through an xml feed I'm getting, but I can't access the specific elements. I'm using python, and the python documentation is really unclear about what I should use.
Here is the feed:
<title>More eagle</title>
<summary>http://www.181.fm/winamp.plsstation=181eagle&style=&description=The%20Eagle%20(Classic ...</summary>
<link rel="alternate" href="http://mail.google.com/mail [email protected]&message_id=12995390f36c310b&view=conv&extsrc=atom" type="text/html" />
<modified>2010-07-02T22:13:51Z</modified>
<issued>2010-07-02T22:13:51Z</issued>
<id>tag:gmail.google.com,2004:1340194246143783179 </id>
And here is my current function:
def parse_xml(feed):
feedxml = minidom.parseString(feed)
name = feedxml.getElementsByTagName('name')
subject = feedxml.getElementsByTagName('title')
contents = feedxml.getElementsByTagName('summary')
return name + "\n" + subject + "\n" + contents
Upvotes: 0
Views: 238
Reputation: 61
To get the text of an element you have to do something like this:
def getElementText(node, tagName):
for node in node.getElementsByTagName(tagName):
result = "" # handle empty elements
for tnode in node.childNodes:
if tnode.nodeType == tnode.TEXT_NODE:
result = tnode.data
return result
def parse_xml(feed):
feedxml = minidom.parseString(feed)
name = getElementText(feedxml,'name')
subject = getElementText(feedxml,'title')
contents = getElementText(feedxml,'summary')
return name + "\n" + subject + "\n" + contents
Upvotes: 1
Reputation: 336078
getElementsByTagName()
returns a list of elements. So if you want the first (or only) one, you need to use getElementsByTagName('name')[0]
.
But this is an element object, not the text enclosed by it (which I presume you're interested in).
So you probably need to do something like this:
nametag = feedxml.getElementsByTagName('name')[0]
nametag.normalize()
name = nametag.firstChild.data
Upvotes: 1