Anders Rabo Thorbeck
Anders Rabo Thorbeck

Reputation: 1206

How to eliminate duplicate XML namespace definitions?

I often come across XML where the same namespace is defined multiple times, instead of only at a parent element of the elements where it is needed.

Is there a simple method/tool for extracting all namespace definitions in an XML and relocating each of these definitions to a node such that each namespace is defined only once? Preferably also with an option for all nodes to be prefixed with their namespace (rather than using a default namespace from some parent). I find this would yield more human readable XML.

As an example, how could one automatically translate this

<m:Albums xmlns:m="http://www.example.com/music">
  <m:Album xmlns:m="http://www.example.com/music">
    <m:Artist xmlns:m="http://www.example.com/music">
      <c:Name xmlns:c="http://www.example.com/common">
        Sting
      </c:Name>
    </m:Artist>
    <m:Title>
      Mercury Falling
    </m:Title>
  </m:Album>
  <Album xmlns="http://www.example.com/music">
    <Artist>
      <c:Name xmlns:c="http://www.example.com/common">
        Maria Mena
      </c:Name>
    </Artist>
    <Title xmlns="http://www.example.com/music">
      Weapon in Mind
    </Title>
  </Album>
</m:Albums>

into this?

<m:Albums xmlns:m="http://www.example.com/music" xmlns:c="http://www.example.com/common">
  <m:Album>
    <m:Artist>
      <c:Name>
        Sting
      </c:Name>
    </m:Artist>
    <m:Title>
      Mercury Falling
    </m:Title>
  </m:Album>
  <m:Album>
    <m:Artist>
      <c:Name>
        Maria Mena
      </c:Name>
    </m:Artist>
    <m:Title>
      Weapon in Mind
    </m:Title>
  </m:Album>
</m:Albums>

Upvotes: 3

Views: 3791

Answers (2)

Anders Rabo Thorbeck
Anders Rabo Thorbeck

Reputation: 1206

As yet a better answer to my own question, I stumbled across this external XSLT transformation (EDIT: XSLT 2.0 version), which does exactly what I wanted.

The result from transforming the question's input XML using this XSLT with Saxon:

<m:Albums xmlns:m="http://www.example.com/music" xmlns:c="http://www.example.com/common">
  <m:Album>
    <m:Artist>
      <c:Name>
        Sting
      </c:Name>
    </m:Artist>
    <m:Title>
      Mercury Falling
    </m:Title>
  </m:Album>
  <m:Album>
    <m:Artist>
      <c:Name>
        Maria Mena
      </c:Name>
    </m:Artist>
    <m:Title>
      Weapon in Mind
    </m:Title>
  </m:Album>
</m:Albums>

Upvotes: 1

Anders Rabo Thorbeck
Anders Rabo Thorbeck

Reputation: 1206

As a partial answer to my own question I found that the unix command

xmllint --nsclean

partially solves the problem, but it does not eliminate all duplicate namespaces. When applied to my example XML from the question, it yields the following.

<m:Albums xmlns:m="http://www.example.com/music">
  <m:Album>
    <m:Artist>
      <c:Name xmlns:c="http://www.example.com/common">
        Sting
      </c:Name>
    </m:Artist>
    <m:Title>
      Mercury Falling
    </m:Title>
  </m:Album>
  <Album xmlns="http://www.example.com/music">
    <Artist>
      <c:Name xmlns:c="http://www.example.com/common">
        Maria Mena
      </c:Name>
    </Artist>
    <Title>
      Weapon in Mind
    </Title>
  </Album>
</m:Albums>

This eliminates namespaces already declared in a parent node. However, it does not pull duplicate namespace declarations up to a common parent (e.g. the c:Name nodes), nor does it remove a duplicate default namespace by converting the affected nodes to use an equivalent non-default namespace (e.g. the Album node in the default namespace, and its children).

Still hoping for a solution which can remove duplicate namespaces also in the cases where xmllint fails.

Upvotes: 2

Related Questions