Reputation:
I've recently had serious grief from XML namespaces and dealing with them effectively in PHP. Here's a sample of the worst kind of culprit:
<dc:type xsi:type="TypeName" xsi:identifier="NN">Others</dc:type>
What I successfully managed to do using preg_replace was to "un-namespace" the tags (without breaking URLs) using:
$xml = preg_replace(
'/<(\/?)([^:" ].*):([^>\/ ].*)(\/?)>/msiU',
'<$1$2_$3$4>',
$x->readOuterXML()
);
# <dc_type xsi:type="TypeName" xsi:identifier="NN">Others</dc_type>
What I couldn't do - through lack of regular expression wizardry - was convert all namespaced attributes into the same format. I managed to convert the first occurence, but don't know how to set a repeatable condition. I deleted the code because it didn't work (and I can't remember what I did), but the result was like this:
<dc_type xsi_type="TypeName" xsi:identifier="NN">Others</dc_type>
Whereas what would be beautiful is this:
<dc_type xsi_type="TypeName" xsi_identifier="NN">Others</dc_type>
Are there any regex masters out there who can help?
Upvotes: 1
Views: 6106
Reputation: 4336
I was looking for the same thing but I know better than to try using regular expressions against XML (search for just about any StackOverfow question about parsing XML/HTML with regex and read the whole answer to find out why. You'll know it when you see it)!
Here is the code I came up with:
<?php
// Some test XML
$xml = <<<XML
<root xmlns:a="bogus.a" xmlns:b="bogus.b">
<a:first>
<b:second>text</b:second>
</a:first>
</root>
XML;
$sxe = new SimpleXMLElement($xml);
$dom_sxe = dom_import_simplexml($sxe);
$dom = new DOMDocument('1.0');
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $dom->appendChild($dom_sxe);
$element = $dom->childNodes->item(0);
// See what the XML looks like before the transformation
echo "<pre>\n" . htmlspecialchars($dom->saveXML()) . "\n</pre>";
foreach ($sxe->getDocNamespaces() as $name => $uri) {
$element->removeAttributeNS($uri, $name);
}
// See what the XML looks like after the transformation
echo "<pre>\n" . htmlspecialchars($dom->saveXML()) . "\n</pre>";
?>
Upvotes: 5
Reputation: 197785
To rewrite a complete XML document like renaming element or attribute names as well as changing namespace related data like xmlns
attributes, you can use the expat based xml parser extension:
This works by parsing the file and change the output on the fly. The parser invokes callback functions (so called handler) that gets the data pre-parsed, for example the elements name in form of a string and the attributes in form of an array.
You then can change these values on the fly and output the (potentially changed) data.
Done this way you don't need to care about regular expressions any longer (which is non-trivial for proper XML parsing).
You can find some boilerplate code to get this started in a previous answer of mine.
Upvotes: 1