David542
David542

Reputation: 110083

Parse xml block with lxml

Given the following xml:

<language>en-US</language>
<provider>VenturesLLC</provider>
<video>
    <original_spoken_locale>en-US</original_spoken_locale>
    <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
    <release_date>2011-01-15</release_date>
    <title>Moving Forward</title>
    <vendor_id>ASDF_ING_2012</vendor_id>
</video>

I am looking to retrieve the entire <video> block. However, when I do:

>>> f=open('metadata.xml')
>>> contents=f.read()
>>> node=etree.fromstring(contents)
>>> node.xpath("//*[local-name()='video']")[0].text
'\n

Note that if I did something like node.xpath("//*[local-name()='original_spoken_locale']")[0].text I get the correct value of 'en-US'. How would I pull this complete text so I can get:

text = """    
<video>
    <original_spoken_locale>en-US</original_spoken_locale>
    <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
    <release_date>2011-01-15</release_date>
    <title>Moving Forward</title>
    <vendor_id>ASDF_ING_2012</vendor_id>
</video>"""

Upvotes: 1

Views: 343

Answers (1)

Daenyth
Daenyth

Reputation: 37431

Your .text call isn't working because your video node doesn't have text - it has other node children. You need to convert those nodes to a string using tostring

In [1]: from lxml import etree

In [2]: xml = '''<xml>
   ...: <language>en-US</language>
   ...: <provider>VenturesLLC</provider>
   ...: <video>
   ...:     <original_spoken_locale>en-US</original_spoken_locale>
   ...:     <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
   ...:     <release_date>2011-01-15</release_date>
   ...:     <title>Moving Forward</title>
   ...:     <vendor_id>ASDF_ING_2012</vendor_id>
   ...: </video></xml>'''

In [3]: tree = etree.fromstring(xml)

In [4]: vid = tree.xpath('//video')[0]

In [5]: etree.tostring(vid, pretty_print=True)
Out[5]: '<video>\n  <original_spoken_locale>en-US</original_spoken_locale>\n  <vendor_offer_code>TEST_VENDOR</vendor_offer_code>\n  <release_date>2011-01-15</release_date>\n  <title>Moving Forward</title>\n  <vendor_id>ASDF_ING_2012</vendor_id>\n</video>\n'

In [6]: print _
<video>
  <original_spoken_locale>en-US</original_spoken_locale>
  <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
  <release_date>2011-01-15</release_date>
  <title>Moving Forward</title>
  <vendor_id>ASDF_ING_2012</vendor_id>
</video>

Upvotes: 2

Related Questions