Reputation: 1206
I often come across XML where the same namespace is defined multiple times, instead of only at a parent element of the elements where it is needed.
Is there a simple method/tool for extracting all namespace definitions in an XML and relocating each of these definitions to a node such that each namespace is defined only once? Preferably also with an option for all nodes to be prefixed with their namespace (rather than using a default namespace from some parent). I find this would yield more human readable XML.
As an example, how could one automatically translate this
<m:Albums xmlns:m="http://www.example.com/music">
<m:Album xmlns:m="http://www.example.com/music">
<m:Artist xmlns:m="http://www.example.com/music">
<c:Name xmlns:c="http://www.example.com/common">
Sting
</c:Name>
</m:Artist>
<m:Title>
Mercury Falling
</m:Title>
</m:Album>
<Album xmlns="http://www.example.com/music">
<Artist>
<c:Name xmlns:c="http://www.example.com/common">
Maria Mena
</c:Name>
</Artist>
<Title xmlns="http://www.example.com/music">
Weapon in Mind
</Title>
</Album>
</m:Albums>
into this?
<m:Albums xmlns:m="http://www.example.com/music" xmlns:c="http://www.example.com/common">
<m:Album>
<m:Artist>
<c:Name>
Sting
</c:Name>
</m:Artist>
<m:Title>
Mercury Falling
</m:Title>
</m:Album>
<m:Album>
<m:Artist>
<c:Name>
Maria Mena
</c:Name>
</m:Artist>
<m:Title>
Weapon in Mind
</m:Title>
</m:Album>
</m:Albums>
Upvotes: 3
Views: 3791
Reputation: 1206
As yet a better answer to my own question, I stumbled across this external XSLT transformation (EDIT: XSLT 2.0 version), which does exactly what I wanted.
The result from transforming the question's input XML using this XSLT with Saxon:
<m:Albums xmlns:m="http://www.example.com/music" xmlns:c="http://www.example.com/common">
<m:Album>
<m:Artist>
<c:Name>
Sting
</c:Name>
</m:Artist>
<m:Title>
Mercury Falling
</m:Title>
</m:Album>
<m:Album>
<m:Artist>
<c:Name>
Maria Mena
</c:Name>
</m:Artist>
<m:Title>
Weapon in Mind
</m:Title>
</m:Album>
</m:Albums>
Upvotes: 1
Reputation: 1206
As a partial answer to my own question I found that the unix command
xmllint --nsclean
partially solves the problem, but it does not eliminate all duplicate namespaces. When applied to my example XML from the question, it yields the following.
<m:Albums xmlns:m="http://www.example.com/music">
<m:Album>
<m:Artist>
<c:Name xmlns:c="http://www.example.com/common">
Sting
</c:Name>
</m:Artist>
<m:Title>
Mercury Falling
</m:Title>
</m:Album>
<Album xmlns="http://www.example.com/music">
<Artist>
<c:Name xmlns:c="http://www.example.com/common">
Maria Mena
</c:Name>
</Artist>
<Title>
Weapon in Mind
</Title>
</Album>
</m:Albums>
This eliminates namespaces already declared in a parent node. However, it does not pull duplicate namespace declarations up to a common parent (e.g. the c:Name
nodes), nor does it remove a duplicate default namespace by converting the affected nodes to use an equivalent non-default namespace (e.g. the Album
node in the default namespace, and its children).
Still hoping for a solution which can remove duplicate namespaces also in the cases where xmllint
fails.
Upvotes: 2