Reputation: 23
Decoding a part of a XML document using xmlschema and XPath, selecting all item
elements that have the attribute name and value doc_id=2
fails.
simple.xml
:<?xml version="1.0" encoding="UTF-8"?>
<na:main
xmlns:na="ames"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="ames ./simple.xsd">
<na:item doc_id="1" ref_id="k1">content_k1</na:item>
<na:item doc_id="2" ref_id="k2">content_k2</na:item>
</na:main>
simple.xsd
:<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:na="ames"
targetNamespace="ames"
elementFormDefault="qualified">
<xs:complexType name="itemtype">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="doc_id" type="xs:int" />
<xs:attribute name="ref_id" type="xs:string" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="maintype">
<xs:sequence>
<xs:element name="item" maxOccurs="unbounded" type="na:itemtype" />
</xs:sequence>
</xs:complexType>
<xs:element name="main" type="na:maintype" />
</xs:schema>
>>> import xmlschema
>>> xs = xmlschema.XMLSchema('simple.xsd')
>>> xs.is_valid('simple.xml')
True
>>> xs.to_dict('simple.xml', ".//na:item[@doc_id=1]")
{'@doc_id': 1, '@ref_id': 'k1', '$': 'content_k1'}
>>> xs.to_dict('simple.xml', ".//na:item[@doc_id=2]")
---------------------------------------------------------------------------
XMLSchemaValidationError Traceback (most recent call last)
<ipython-input-57-8ff81c2eaf9c> in <module>
----> 1 xmlschema.XMLSchema('simple.xsd').to_dict('simple.xml', ".//na:item[@doc_id=2]")
/rao/uhome/rmol3/bin/anaconda3_rmol3_lglxs408_2/envs/MW41_Quicklook/lib/python3.7/site-packages/xmlschema/validators/schema.py in decode(self, source, path, schema_path, validation, *args, **kwargs)
1553 """
1554 data, errors = [], []
-> 1555 for result in self.iter_decode(source, path, schema_path, validation, *args, **kwargs):
1556 if not isinstance(result, XMLSchemaValidationError):
1557 data.append(result)
/rao/uhome/rmol3/bin/anaconda3_rmol3_lglxs408_2/envs/MW41_Quicklook/lib/python3.7/site-packages/xmlschema/validators/schema.py in iter_decode(self, source, path, schema_path, validation, process_namespaces, namespaces, use_defaults, decimal_type, datetime_types, converter, filler, fill_missing, max_depth, depth_filler, lazy_decode, **kwargs)
1540 else:
1541 reason = "{!r} is not an element of the schema".format(elem)
-> 1542 yield schema.validation_error(validation, reason, elem, source, namespaces)
1543 return
1544
/rao/uhome/rmol3/bin/anaconda3_rmol3_lglxs408_2/envs/MW41_Quicklook/lib/python3.7/site-packages/xmlschema/validators/xsdbase.py in validation_error(self, validation, error, obj, source, namespaces, **_kwargs)
904
905 if validation == 'strict' and error.elem is not None:
--> 906 raise error
907 return error
908
XMLSchemaValidationError: failed validating <Element '{ames}item' at 0x7eff7913db90> with XMLSchema10(basename='simple.xsd', namespace='ames'):
Reason: <Element '{ames}item' at 0x7eff7913db90> is not an element of the schema
Instance:
<na:item xmlns:na="ames" doc_id="2" ref_id="k2">content_k2</na:item>
Path: /na:main/na:item[2]
What is wrong with the XPath statement ".//na:item[@doc_id=2]"
?
Upvotes: 2
Views: 1568
Reputation: 23
More examples that work:
>>> import xmlschema
>>> xs = xmlschema.XMLSchema('simple.xsd')
>>> xs.to_dict('simple.xml', ".//na:item[@doc_id='2']", schema_path='.//na:item')
{'@doc_id': 2, '@ref_id': 'k2', '$': 'content_k2'}
>>> xs.to_dict('simple.xml', ".//na:item[@doc_id=2]", schema_path='.//na:item')
{'@doc_id': 2, '@ref_id': 'k2', '$': 'content_k2'}
Upvotes: 0
Reputation: 752
The schema processor doesn't find the matching schema element for decoding data because the provided path is not suitable to be used on schema elements. You have to provide an explicit schema_path that point to the right XSD element:
>>> xs.to_dict("simple.xml", "/na:main/na:item[@doc_id=2]", schema_path="/na:main/na:item")
{'@doc_id': 2, '@ref_id': 'k2', '$': 'content_k2'}
Upvotes: 2
Reputation: 2422
The XPath is not relevant - it can only be executed if the XML document can be parsed. But you are getting a Schema Validation error from the XML parser. It is claiming that the root tag in your document is not declared in your XSD. However, I have tested your XSD and XML in https://www.freeformatter.com/xml-validator-xsd.html and it validates OK.
Please check that the XML/XSD combination that you posted is the one that you tested with - that might explain the rather puzzling situation.
Upvotes: 0
Reputation: 22321
You can try the following XPath:
/na:main/na:item[@doc_id='2']
Upvotes: 0