Reputation: 4336
Given some XML like the following, how can you completely remove a particular namespace, including its declaration, from each element?
<?xml version="1.0" encoding="UTF-8"?>
<document xmlns:my-co="http://www.example.com/2015/co">
<my-namespace:first xmlns:my-namespace="http://www.example.com/2015/ns">
<element my-namespace:id="1">
</element>
</my-namespace:first>
<second>
<my-namespace:element xmlns:my-namespace="http://www.example.com/2015/ns" my-co:id="2">
</my-namespace:element>
</second>
</document>
Notice there is no xmlns:my-namespace
declaration at the root level and the two declarations are in different parts and levels of the XML structure.
How can you efficiently remove just the namespace my-namespace
without having to check each node in the code?
This is what the XML should look like afterwards:
<?xml version="1.0" encoding="UTF-8"?>
<document xmlns:my-co="http://www.example.com/2015/co">
<first>
<element id="1">
</element>
</first>
<second>
<element my-co:id="2">
</element>
</second>
</document>
Upvotes: 3
Views: 4725
Reputation: 525
SimpleXML has a function to extract all namespace info, DOMXML has the function to remove it if you know what to remove.
Here is a simple function to extract namespace info from a DOMDoc by importing it into a SimpleXML Element and than using the namespace array to remove all the namespace stuff from the DOMdoc natively
function removeNamespaces(DOMDocument $domdoc)
{
// convert to a SimpleXML element
$simplexml = simplexml_import_dom($domdoc);
// get all the namespaces
$namespaces = $simplexml->getDocNamespaces(true, true);
// loop through the namespaces
foreach($namespaces as $prefix => $uri)
{
// remove namespace stuff
$domdoc->documentElement->removeAttributeNS($uri, $prefix);
}
// return cleaned doc
return $domdoc;
}
Upvotes: 1
Reputation: 108
We wanted to remove the namespaces as well (in our case all namespaces, not just a specific one), but the above solution only worked partially. If a prefix is defined multiple times but with a different URI the first answer doesn't remove them all.
A solution that worked for us in all use cases, was to use SimpleXMLElement
to search for namespaces and use SimpleXMLElement->xpath()
to search for nodes of that namespace, then transform to a DOMElement
to remove the namespace. For us the memory management was better using that approach as opposed to loading the XML in DOM and using DOMXPath
.
A sample XML to test against:
<xml xmlns="http://foo" xmlns:bar="http://bar" xmlns:baz="http://baz">
<foo bam="hoi">Hello World</foo>
<foo baz:bam="hoi">Hello World</foo>
<bar:foo bam="hoi">Hello World</bar:foo>
<bar:foo bar:bam="hoi">Hello World</bar:foo>
<bar:foo baz:bam="hoi">Hello World</bar:foo>
<baz:foo bar:bam="hoi">Hello World</baz:foo>
<plop:foo xmlns:plop="http://plop" xmlns:bar="http://baasdr">
<bar:foo>
<bar:foo xmlns:plop="http://plop">
<plop:foo>
<plop:foo>
<plop:foo xmlns:bar="http://bar">
<bar:baz>Hello World</bar:baz>
</plop:foo>
</plop:foo>
</plop:foo>
</bar:foo>
</bar:foo>
</plop:foo>
</xml>
The sample code to remove namespaces:
function removeNamespaces(SimpleXMLElement $xml) {
while($namespaces = $xml->getDocNamespaces(true, true)) {
$uri = reset($namespaces);
$prefix = key($namespaces);
$elements = $xml->xpath("//*[namespace::*[name() = '{$prefix}' and . = '{$uri}'] and not (../namespace::*[name() = '{$prefix}' and . = '{$uri}'])]");
$element = dom_import_simplexml($elements[0]);
foreach($namespaces as $prefix => $uri) {
$element->removeAttributeNS($uri, $prefix);
}
$xml = new SimpleXMLElement($xml->asXML());
}
return $xml;
}
The SimpleXMLElement
is recreated because in some cases if you try to access or manipulate SimpleXMLElement
after using DOM to remove the namespaces PHP (5.6) crashed with a segmentation fault. Luckily though asXML()
kept functioning to allow for this workaround, as a newly created object did not cause crashes.
If you want to remove specific namespaces you could rewrite the function and/or xpath in a way that it only searches specific namespaces. Note that you'll then also have to change the use of SimpleXMLElement->getDocNamespaces(true, true)
.
Additional note, we only look for the first node of the first namespace and then try to remove all namespaces from that node for performance reasons. We sometimes have to work with horrible XMLs that can contain over 100 different namespaces and could be several MB's big. Doing an xpath for each namespace was very slow on those documents. This solution drastically improves performance because it works under the assumption that most, if not all, namespaces are declared in the same element (usually the root element). So instead of looping through and doing an xpath for each namespace individually, it just tries to remove all namespaces from the first element found for the first namespace in the document and then re-checks if there are still namespaces left. But if there are namespaces later on in the document it still removes those as well. If the namespaces are more spread out through the document a different approach may be better though.
Upvotes: 5
Reputation: 4336
The following code does the trick:
// Removes the namespace $ns from all elements in the DOMDocument $doc
function remove_dom_namespace($doc, $ns) {
$finder = new DOMXPath($doc);
$nodes = $finder->query("//*[namespace::{$ns} and not(../namespace::{$ns})]");
foreach ($nodes as $n) {
$ns_uri = $n->lookupNamespaceURI($ns);
$n->removeAttributeNS($ns_uri, $ns);
}
}
// Usage:
$mydoc = new DOMDocument();
$mydoc->load('test.xml'); // Load "before" XML
remove_dom_namespace($mydoc, 'my-namespace');
// Prints the above "after" XML
echo $mydoc->saveXML(null, LIBXML_NOEMPTYTAG);
The XPath query finds all nodes that have a namespace node called $ns
where their parent node doesn't also have the same namespace. This will find /document/my-namespace:first
and /document/second/my-namespace:element
but not /document/my-namespace:first/element
because its parent also has the namespace my-namespace
. The code then removes the specified namespace from each element found. Removing the namespace from an element automatically removes it from all of its children too.
A lot of real world XML documents have all their xmlns
declarations on the root element but this code handles them anywhere.
Upvotes: 3