Reputation: 345
I am trying to parse XML and am hard time having. I dont understand why the results keep printing [<Element 'Results' at 0x105fc6110>]
I am trying to extract Social
from my example with the
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
results = root.findall("Results")
print results #[<Element 'Results' at 0x105fc6110>]
# WHAT IS THIS??
for result in results:
print result.find("Social") #None
the XML looks like this:
<?xml version="1.0"?>
<List1>
<NextOffset>AAA</NextOffset>
<Results>
<R>
<D>internet.com</D>
<META>
<Social>
<v>http://twitter.com/internet</v>
<v>http://facebook.com/internet</v>
</Social>
<Telephones>
<v>+1-555-555-6767</v>
</Telephones>
</META>
</R>
</Results>
</List1>
Upvotes: 2
Views: 636
Reputation: 76
results = root.findall("Results")
is a list
of xml.etree.ElementTree.Element
objects.
type(results)
# list
type(results[0])
# xml.etree.ElementTree.Element
find
and findall
only look within first children. The iter
method will iterate through matching sub-children at any level.
If <Results>
could potentially have more than one <Social>
element, you could use this:
for result in results:
for soc in result.iter("Social"):
for link in soc.iter("v"):
print link.text
That's worst case scenario. If you know there'll be one <Social>
per <Results>
then it simplifies to:
for soc in root.iter("Social"):
for link in soc.iter("v"):
print link.text
both return
"http://twitter.com/internet"
"http://facebook.com/internet"
Or use nested list comprehensions and do it with one line of code. Because Python...
socialLinks = [[v.text for v in soc] for soc in root.iter("Social")]
# socialLinks == [['http://twitter.com/internet', 'http://facebook.com/internet']]
socialLinks
is list of lists. The outer list is of <Social>
elements (only one in this example)
Each inner list contains the text from the v
elements within each particular <Social>
element .
Upvotes: 2
Reputation: 140148
findall
returns a list
of xml.etree.ElementTree.Element
objects. In your case, you only have 1 Result
node, so you could use find
to look for the first/unique match.
Once you got it, you have to use find
using the .//
syntax which allows to search in anywhere in the tree, not only the one directly under Result
.
Once you found it, just findall
on v
tag and print the text:
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
result = root.find("Results")
social = result.find(".//Social")
for r in social.findall("v"):
print(r.text)
results in:
http://twitter.com/internet
http://facebook.com/internet
note that I did not perform validity check on the xml file. You should check if the find
method returns None
and handle the error accordignly.
Note that even though I'm not confident myself with xml format, I learned all that I know on parsing it by following this lxml tutorial.
Upvotes: 2