DocWiki
DocWiki

Reputation: 3584

Python and XML Processing

I have used urllib to get the following data:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<videos xmlns:xs="http://www.w3.org/2001/XMLSchema" 
        xmlns:www="http://www.www.com"">
  <video type="cl">
    <cd>
      <src lang="music">http://www.google.com/ </src>
    </cd>
  </video>
</videos>

I want to get http://www.google.com/ out, here is my code:

import xml.etree.ElementTree as etree
data='<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com""><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'
tree = etree.fromstring(data)
geturl=tree.findtext('/video/cd/src').strip()
print geturl

I get error:

AttributeError: 'NoneType' object has no attribute 'strip'

Obviously, the findtext failed. I tried findtext('src'), also wont work.

Whats wrong?

Upvotes: 1

Views: 357

Answers (1)

unutbu
unutbu

Reputation: 880707

Remove the first forward-slash from the path: video/cd/src:

import xml.etree.ElementTree as etree
data='''<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com"><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'''
tree = etree.fromstring(data)
geturl=tree.findtext('video/cd/src').strip()
print geturl

yields

http://www.google.com/

The forward-slash indicates an absolute path, which is not allowed on elements.

PS. There is also a syntax error in the data you posted: xmlns:www="http://www.www.com"" has two double-quotes at the end...

Upvotes: 2

Related Questions