Reputation: 183
I was successfully using the following code to merge multiple large XML files into a new (larger) XML file. Found at least part of this on StackOverflow
$docList = new DOMDocument(); $root = $docList->createElement('documents'); $docList->appendChild($root); $doc = new DOMDocument(); foreach(xmlFilenames as $xmlfilename) { $doc->load($xmlfilename); $xmlString = $doc->saveXML($doc->documentElement); $xpath = new DOMXPath($doc); $query = self::getQuery(); // this is the name of the ROOT element $nodelist = $xpath->evaluate($query, $doc->documentElement); if( $nodelist->length > 0 ) { $node = $docList->importNode($nodelist->item(0), true); $xmldownload = $docList->createElement('document'); if (self::getShowFileName()) $xmldownload->setAttribute("filename", $filename); $xmldownload->appendChild($node); $root->appendChild($xmldownload); } } $newXMLFile = self::getNewXMLFile(); $docList->save($newXMLFile);
I started running into OUT OF MEMORY issues when the number of files grew as did the size of them.
I found an article here which explained the issue and recommended using XMLWriter
So, now trying to use PHP XMLWriter to merge multiple large XML files together into a new (larger) XML file. Later, I will execute xpath against the new file.
Code:
$xmlWriter = new XMLWriter(); $xmlWriter->openMemory(); $xmlWriter->openUri('mynewFile.xml'); $xmlWriter->setIndent(true); $xmlWriter->startDocument('1.0', 'UTF-8'); $xmlWriter->startElement('documents'); $doc = new DOMDocument(); foreach($xmlfilenames as $xmlfilename) { $fileContents = file_get_contents($xmlfilename); $xmlWriter->writeElement('document',$fileContents); } $xmlWriter->endElement(); $xmlWriter->endDocument(); $xmlWriter->flush();
Well, the resultant (new) xml file is no longer correct since elements are escaped - i.e. <?xml version="1.0" encoding="UTF-8"?>
<CONFIRMOWNX>
<Confirm>
<LglVeh id="GLE">
<AddrLine1>GLEACHER &amp; COMPANY</AddrLine1>
<AddrLine2>DESCAP DIVISION</AddrLine2>
Can anyone explain how to take the content from the XML file and write them properly to new file?
I'm burnt on this and I KNOW it'll be something simple I'm missing.
Thanks. Robert
Upvotes: 2
Views: 4812
Reputation: 106385
See, the problem is that XMLWriter::writeElement is intended to, well, write a complete XML element. That's why it automatically sanitize (replace &
with &
, for example) the contents of what's been passed to it as the second param.
One possible solution is to use XMLWriter::writeRaw method instead, as it writes the contents as is - without any sanitizing. Obviously it doesn't validate its inputs, but in your case it does not seem to be a problem (as you're working with already checked source).
Upvotes: 4
Reputation: 848
Hmm, Not sure why it's converting it to HTML Characters, but you can decode it like so
htmlspecialchars_decode($data);
It converts special HTML entities back to characters.
Upvotes: -2