Reputation: 80

Formal or Practical XML Tag Length Limit?

I've not managed to find any mention of a limit to xml tag length on the web. I'm looking to build XML Schemas that act as a specification for 3rd parties to send data to us.

The Schema (and the data) are supposed to conform to our custom ontology/data dictionary thingy which is hierarchical and user-customizable.

The natural mapping is for nodes in the hierarchy to be used to name types and tags in the XSD/XML. Because however leaf node names in the ontology do not have to be unique, I am considering encoding the full path of nodes in the hierarchy as the tag name, suitably mangled for XML lexical rules.

So if my ontology has multiple 'lisa' nodes meaning different things as they are at different places in the hierarchy I could use the full path to the nodes to generate different XML types/tag names, so you can have

 <abe_homer_lisa> simpsons lisa ... </abe_homer_lisa>
 <applei_appleii_lisa> ... apple lisa </applei_appleii_lisa>
 <mona_lisa> and paintings </mona_lisa>

... data for any of the different 'lisa' types in the same file without ambiguity.

I can't find anything on the web that specifies a maximum tag length (or a minimum supported tag length for standards-compliant engines). (Good summary of the lexical rules for XML here)

The same thing was asked about attribute length and if the standard specifies no limit for attributes then I doubt there's one for tags, but there may be a practical limit.

I suspect even a practical limit would be vastly bigger than my needs (I would expect things to be smaller than 255 chars most of the time); basically if the Java XML processors, standard ETL tools and the common XSLT processors can all handle tags much bigger than this then it won't be an issue.

Upvotes: 2

Answers (6)

googlydalek

Reputation: 80

Thanks to those who pointed out there might be more sensible ways to address the underlying problem (ensuring types/tag names in an XML schema are unique).

Re using a hierarchy of nodes to provide the context: I agree this would generally be appropriate. However (I didn't really explain my precise problem domain in the q) in this particular case, the user-configurable grouping of items in the tree-structure data dictionary I have to deal with is pretty arbitrary and has almost nothing to do with relationships in the data that the dictionary describes.

So in the

 <abe>
   <homer>
     <lisa>lisa1</lisa>
   </homer>
 </abe>

example should another lisa node be under the same homer node, or a different one? Should the homers be under the same abe node or not? In the case of the data in question, the distinction is more or less meaningless: it would be like grouping data according to the page of an index it happened to be referenced on in a particular book. I suppose I could just make an arbitrary call and lock it down in the XSD.

If using something like XSL to extract data then it wouldn't matter, //abe/homer/lisa would get all of the lisa nodes irrespective of how they were grouped together. In practice someone is likely to be generating these from CSV files or whatever so I'd prefer as flat a structure as possible.

Ditto for namespaces: although they're designed for this very purpose (providing context for a name and ensuring that accidental clashes do not cause ambiguity when different types of data are bundled together in a file), in practice they'd add an extra layer of complexity to whoever generates the data from source systems.

In my precise circumstances, I expect name clashes in this arbitrary grouping to be pretty unlikely (and reflect poor usage), and hence just need reasonable handling, without imposing an undue penalty on the majority case.

Upvotes: 0

googlydalek

Reputation: 80

Based on the comments of Michael Kay (something of an expert on XML) and Mihai Stancu above I'd say the answer to my original question was:

No official limit
Tools likely to support 1000+ chars as an absolute minimum
Likely to hit problems in performance [given an XML tool processing those files would have to do lots of string indexing and comparison on very long strings] and usability way before then
XML namespaces and/or using the structure of the document tree to provide discriminating context would probably be better ways of "uniquifying" tag names

I was after an answer to that very specific question about legal tag length, and since I found the same question asked about attribute length but not tags I thought it might be worth having "an" answer around in case someone else googles it. Thanks to all respondents. Valid points about whether my design was sensible; I'll explain the rationale elsewhere.

Upvotes: 1

arayq2

Reputation: 2554

Contrary to conventional wisdom, I would strongly advise against using the so-called XML Namespaces mechanism. Over the longer haul, it will cause you pain. Just say no to namespaces. You do not need them.

Your intuition that elements can be distinguished by their context - represented, in this case, by their "paths" - is correct. However, your idea of encoding the entire path into the name of an element may not be optimal. Consider instead using the simple name, along with an attribute to hold the context or path. (Name this attribute 'context' or 'path' or anything more evocative!) This will be enough to distinguish the meanings.[*]

For varying content models, you can use a variant of the same technique. Give each different type a circumstantially convenient name, and record the "real" name in another attribute named, say 'ontology'.

As for your question, the XML spec does not place any inherent limitation on the length of names, although for purely technical reasons you may find a limit of 65536 characters quoted in some places. That same "limitation" may also apply to the length of attribute value literals. At an average of 20 characters per atomic name, 20 levels of hierarchy would still amount to fewer than 500 bytes for a path, so you probably have little to worry about.

[*] Note: this technique is actually very old, but almost completely forgotten in XML mindspace. In HTML, for example, there is a single element type named INPUT to cover all sorts of GUI controls, and yet there is no confusion, thanks to the 'type' attribute.

Upvotes: -1

Michael Kay

Reputation: 163458

I think you're unlikely to find tools that can't handle names of say 1K characters, at which point you're hitting serious performance and usability problems rather than hard limits.

But your design is wrong. XML is hierarchic, take advantage of the fact rather than trying to fight it.

Upvotes: 7

Martin Honnen

Reputation: 167696

I would strongly suggest to use an established XML mechanism to distinguish elements, namely to use namespaces. That way you would have e.g.

<lisa xmlns="http://example.com/simpsons">..</lisa>

<lisa xmlns="http://example.com/apple">...</lisa>

Both the W3C schema language as well as XSLT and XPath fully support namespaces.

Upvotes: 3

Mihai Stancu

Reputation: 16117

There is no limit to tag name lengths that I know of but there can be some implementation limits depending on the tool that tries to parse the XML even if the XML specification may not mention any limits.

On the other hand why not use XML's native & inherently hierarchical structure. Why encode everything as <abe_homer_lisa> instead of encoding it as:

<abe>
    <homer>
        <lisa>simpsons lisa</lisa>
    </homer>
</abe>
<applei>
    <appleii>
        <lisa> ... apple lisa </lisa>
    </applei>
</appleii>

Upvotes: 4

Formal or Practical XML Tag Length Limit?

Answers (6)

Related Questions