Reputation: 548
I am able to extract values from elements (using lxml in python 2.7) when one namespace is used. However I can't figure out how to extract values when a second namespace is used. I want to extract the value within //cc-cpl:MainClosedCaption/Id
but I keep getting lxml.etree.XPathEvalError: Invalid expression
errors.
To be specific, the value I'm trying to exract from my sample xml is urn:uuid:6ca58b51-9116-4131-8652-feaed20dca0d
Here's a snipped of the xml (from a Digital Cinema Package):
<?xml version="1.0" encoding="UTF-8"?>
<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<Reel>
<Id>urn:uuid:58cf368f-ed30-40d8-9258-dd7572035b69</Id>
<MainPicture>
<Id>urn:uuid:afe91f7a-6451-4b9f-be2e-345f9a28da6d</Id>
</MainPicture>
<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#">
<Id>urn:uuid:6ca58b51-9116-4131-8652-feaed20dca0d</Id>
</cc-cpl:MainClosedCaption>
</Reel>
</CompositionPlaylist>
Here is an example of code that works:
from lxml import etree
cpl_parse = etree.parse('filename.xml')
pkl_namespace = cpl_parse.xpath('namespace-uri(.)')
xmluuid = cpl_parse.xpath('//ns:MainPicture/ns:Id',namespaces={'ns': pkl_namespace})
for i in xmluuid:
print i.text
When I try to specify the following xpath instead: //ns:MainClosedCaption/ns:Id
- I end up with errors.
When I specify the namespace with:
pkl_namespace = 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#"'
I receive a lxml.etree.XPathEvalError: Invalid expression
error
I know this is a stupid attempt, but the following produced the same error:
'//ns:cc-cpl:MainClosed Caption/ns:cc-cpl:Id'
I tried to include the two namespaces in a dictionary as in this answer: https://stackoverflow.com/a/36227869/2188572 , and while I don't get any errors, I end up with no values extracted. Here's my dictionary:
namespaces = {
'ns': 'http://www.digicine.com/PROTO-ASDCP-CPL-20040511#',
'ns2': 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#',
}
and my command:
xmluuid = cpl_parse.xpath('//ns:AssetList/ns2:MainClosedCaption/ns2:Id',namespaces=namespaces)
I found this, Extracting nested namespace from a xml using lxml which is actually the exact same kind of xml that I'm working on, but his request was to get the namespace URL, not the actual values of elements.
Edit: Using the method from the previous answer to extract the namespace, I tried the following, but got the same errors:
from lxml import etree
import sys
filename = sys.argv[1]
cpl_parse = etree.parse(filename)
pkl_namespace = etree.QName(cpl_parse.find('.//{*}MainClosedCaption')).namespace
print pkl_namespace
xmluuid = cpl_parse.xpath('//ns:cc-cpl:MainClosedCaption/ns:cc-cpl:Id',namespaces={'ns': pkl_namespace})
for i in xmluuid:
print i.text
and here's the errors in full:
Traceback (most recent call last):
File "sub.py", line 8, in <module>
xmluuid = cpl_parse.xpath('//ns:cc-cpl:MainClosedCaption/ns:cc-cpl:Id',namespaces={'ns': pkl_namespace})
File "lxml.etree.pyx", line 2115, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:57654)
File "xpath.pxi", line 370, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:146564)
File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962)
File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144817)
lxml.etree.XPathEvalError: Invalid expression
Upvotes: 2
Views: 792
Reputation: 288260
The Id
element in MainClosedCaption belongs to the 2004 namespace. Only an attribute xmlns="..."
can change the default namespace; attributes of the form xmlns:something="..."
only add a namespace which has to be explicitly declared.
Try this:
from lxml import etree
cpl_parse = etree.parse('filename.xml')
xmluuid = cpl_parse.xpath('//proto2007:MainClosedCaption/proto2004:Id', namespaces={
'proto2004': 'http://www.digicine.com/PROTO-ASDCP-CPL-20040511#',
'proto2007': 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#',
})
for i in xmluuid:
print(i.text)
Upvotes: 2