shoosh
shoosh

Reputation: 78914

python lxml: import XSD from a buffer?

I'm using LXML from python to verify an XML with a matching XSD.
That XSD imports a second "common" XSD which includes some common definitions.
The problem is that these XSDs do not exist locally as files. They are just buffers I hold in memory, but when the XSD does <import> or <redefine> it looks for the imported file in the current directory in the file system.

Is there a way to make it not do that? Maybe supply the imported XSD in advance?

LXML uses libxml2 and libxslt for parsing.
The opening of the imported XSD file originates from deep inside libxml2 code and does not go through python's file handling, so just overriding open() doesn't work. It also seems that libxml2 does not have any facility to give it a file resolver. it just calls fopen() directly.

So a solution will probably need to be in a higher level, maybe override a namespace or something like that?

Upvotes: 4

Views: 821

Answers (1)

kjhughes
kjhughes

Reputation: 111521

Rather than attack the problem via open()/fopen() overrides or changing source namespaces, consider using XML Catalogs or a custom URI resolver.

XML Catalogs allow you to control:

  1. Mapping an external entity's public identifier and/or system identifier to a URI reference.
  2. Mapping the URI reference of a resource (a namespace name, stylesheet, image, etc.) to another URI reference.

You can read how to use XML Catalogs with libxml2 here.

While an XML Catalog will not directly support memory-based XSDs, you may be able to find a better override method than the lower-level open()/fopen() methods.

However, a more promising approach may be to write a custom URI resolver. An example custom URI resolver is provided in the lxml documentation:

>>> from lxml import etree

>>> class DTDResolver(etree.Resolver):
...     def resolve(self, url, id, context):
...         print("Resolving URL '%s'" % url)
...         return self.resolve_string(
...             '<!ENTITY myentity "[resolved text: %s]">' % url, context)

Upvotes: 2

Related Questions