JPashs
JPashs

Reputation: 13886

Get xml file to find and replace text. PHP

I need to change texts in a XML file using PHP code. Then I created a code to:

1- get the file

2- replace the texts

3- save the file with other name.

Problem is that I am having some issues to replace some text in a xml file.

I am able to replace simples strings but I can not replace text with characters like '<'. Below the real code and files.

Original XML path: http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml

1) This code just changes the text Inmuebles to xxxxxxxx. This works fine

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    'Inmuebles' => 'xxxxxxxx'
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

2) Now, if I use this code to change the text <Table Name="Inmuebles"> to <xxxxxxxx> I get a ERROR 500.

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    '<Table Name="Inmuebles">' => '<xxxxxxxx>'
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

3) In the same way, if I use this code to remove the text Publicacion I get a ERROR 500.

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    '<Publicacion>' => ''
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

This is the final result I need to get:http://www.csainmobiliaria.com/imagenes/fotos/pisos-OK.xml

Capture: enter image description here

Upvotes: 4

Views: 3075

Answers (3)

Parfait
Parfait

Reputation: 107587

Consider again, XSLT, the W3C standards compliant, special-purpose language designed to modify XML files to needed user specification such as your #1-3 needs. Like the other popular declarative language, SQL, XSLT is not limited to PHP but portable to other application layers (Java, C#, Python, Perl, R) and dedicated XSLT 1.0, 2.0, and 3.0 .exe processors.

With this approach, XSLT's recursive styling allows you to avoid any foreach looping, if logic, and repeated lines like addChild or appendChild calls at the application layer.

XSLT (save as an .xsl file, a special .xml file, or embedded string; portable to other interfaces beyond PHP)

<?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" indent="yes" encoding="ISO-8859-1"/>
     <xsl:strip-space elements="*"/>

     <!-- WALK DOWN TREE FROM ROOT -->
     <xsl:template match="Publication">
        <xsl:apply-templates select="Table"/>
     </xsl:template>

     <xsl:template match="Table[@Name='Inmuebles']">
         <Inmuebles>
             <xsl:apply-templates select="*"/>
         </Inmuebles>
     </xsl:template>

     <!-- EMPTY TEMPLATE TO REMOVE SPECIFIED NODES -->
     <xsl:template match="Table[@Name='Agencias']"/>

     <!-- RETURN ONLY FIRST FIVE NODES -->
     <xsl:template match="Table/*">
         <Inmuebles>
             <xsl:copy-of select="*[position() &lt;= 5]"/>
         </Inmuebles>
     </xsl:template>

</xsl:stylesheet>

XSLT Demo

PHP (using the php_xsl library)

// LOAD XML SOURCE
$url = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$web_data = file_get_contents($url);
$xml = new SimpleXMLElement($web_data);

// LOAD XSL SCRIPT
$xsl = simplexml_load_file('/path/to/script.xsl');

// XSLT TRANSFORMATION
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 
$newXML = $proc->transformToXML($xml);

// OUTPUT TO CONSOLE
echo $newXML;

// SAVE TO FILE
file_put_contents('Output.xml', $newXML);

And as the great XSLT guru, @Dimitre Novatchev, usually ends his posts: the wanted, correct result is produced:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Inmuebles>
   <Inmuebles>
      <IdInmobiliariaExterna>B45695855</IdInmobiliariaExterna>
      <IdPisoExterno>100002</IdPisoExterno>
      <FechaHoraModificado>30/11/2018</FechaHoraModificado>
      <TipoInmueble>PISO</TipoInmueble>
      <TipoOperacion>3</TipoOperacion>
   </Inmuebles>
   <Inmuebles>
      <IdInmobiliariaExterna>B45695855</IdInmobiliariaExterna>
      <IdPisoExterno>100003</IdPisoExterno>
      <FechaHoraModificado>30/11/2018</FechaHoraModificado>
      <TipoInmueble>CHALET</TipoInmueble>
      <TipoOperacion>4</TipoOperacion>
   </Inmuebles>
</Inmuebles>

Upvotes: 0

Nigel Ren
Nigel Ren

Reputation: 57121

DOMDocument allows you to copy structures of nodes, so rather than having to copy all the details individually (which can be prone to missing data when the specification changes), you can copy an entire node (such as <Inmueble>) from one document to another using importNode() which has a parameter to indicate that the full content of the element should be copied. This approach also allows you to copy any of the tables using the same function without code changes...

function extractData ( $sourceFile, $table )    {
    // Load source data
    $source = new DOMDocument();
    $source->load($sourceFile);
    $xp = new DOMXPath($source);

    // Create new data document
    $newFile = new DOMDocument();
    $newFile->formatOutput = true;
    // Create base element with the table name in new document
    $newRoot = $newFile->createElement($table);
    $newFile->appendChild($newRoot);

    // Find the records to copy
    $records = $xp->query('//Table[@Name="'.$table.'"]/*');
    foreach ( $records as $record ) {
        // Import the node to copy and append it to new document
        $newRoot->appendChild();
    }
    // Return the source of the XML
    return $newFile->saveXML();
}

echo extractData ($xml_external_path, "Inmuebles");

You could alter the method to return the document as DOMDocument or even a SimpleXML version if you wished to process it further.

For SimpleXML, change the return to...

return simplexml_import_dom($newRoot);

and then you can call it as...

$ret = extractData ($xml_external_path, "Inmuebles");
echo $ret->asXML();

Or if you just want a fixed way of doing this, you can remove the XPath and just use getElementsByTagName() to find the nodes to copy...

$source = new DOMDocument();
$source->load($xml_external_path);

$newFile = new DOMDocument();
$newRoot = $newFile->createElement("Inmuebles");
$newFile->appendChild($newRoot);

// Find the records to copy
foreach ( $source->getElementsByTagName("Inmueble") as $record ) {
    $newRoot->appendChild($newFile->importNode($record, true));
}
echo $newFile->saveXML();

To add the save file name, I've added a new parameter to the function, this new function doesn't return anything at all - it just loads the file and saves the result to the new file name...

function extractData ( $sourceFile, $table, $newFileName )    {
    // Load source data
    $source = new DOMDocument();
    $source->load($sourceFile);
    $xp = new DOMXPath($source);

    // Create new file document
    $newFile = new DOMDocument();
    $newFile->formatOutput = true;
    // Create base element with the table name in new document
    $newRoot = $newFile->createElement($table);
    $newFile->appendChild($newRoot);

    // Find the records to copy
    $records = $xp->query('//Table[@Name="'.$table.'"]/*');
    foreach ( $records as $record ) {
        // Import the node to copy and append it to new document
        $importNode = $newFile->importNode($record, true);
        // Add new content
        $importNode->appendChild($newFile->createElement("Title", "value"));
        $newRoot->appendChild();
    }

    // Update Foto elements
    $xp = new DOMXPath($newFile);
    $fotos = $xp->query("//*[starts-with(local-name(), 'Foto')]");
    foreach ( $fotos as $foto ) {
        $path = $foto->nodeValue;
        if( substr($path, 0, 5) == "/www/" )    {
            $path = substr($path,4);
        }
        // Replace node with new version
        $foto->parentNode->replaceChild($newFile->createElement("Foto1", $path), 
                  $foto);
    }  

    $newFile->save($newFileName);
}
$xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos.xml';
$xml_external_savepath = 'saveFile.xml';

extractData ($xml_external_path, "Inmuebles", $xml_external_savepath);

Upvotes: 4

Maksym Fedorov
Maksym Fedorov

Reputation: 6456

You can copy the necessary node instead of removing any excess elements. For example, you can copy Inmuebles node with help SimpleXML:

$path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$content = file_get_contents($path);
$sourceXML = new SimpleXMLElement($content);

$targetXML = new SimpleXMLElement("<Inmuebles></Inmuebles>");

$items = $sourceXML->xpath('Table[@Name=\'Inmuebles\']');
foreach ($items as $item) {
    foreach ($item->Inmueble as $inmueble) {
        $node  = $targetXML->addChild('Inmueble');
        $node->addChild('IdInmobiliariaExterna', $inmueble->IdInmobiliariaExterna);
        $node->addChild('IdPisoExterno', $inmueble->IdPisoExterno);
        $node->addChild('FechaHoraModificado', $inmueble->FechaHoraModificado);
        $node->addChild('TipoInmueble', $inmueble->TipoInmueble);
        $node->addChild('TipoOperacion', $inmueble->TipoOperacion);
    }
}

echo $targetXML->asXML()

Also, as @ThW said in comments you can use XLST, for example:

$path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$content = file_get_contents($path);
$sourceXML = new SimpleXMLElement($content);

$xslt='<?xml version="1.0" encoding="ISO-8859-1"?>
         <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
         <xsl:output method="xml" indent="yes"/>

         <xsl:template match="Table[@Name=\'Inmuebles\']">
             <Inmuebles>
                 <xsl:copy-of select="node()"/>
             </Inmuebles>
         </xsl:template>

         <xsl:template match="Table[@Name=\'Agencias\']"/>
</xsl:stylesheet>';


$xsl = new SimpleXMLElement($xslt);

$processor = new XSLTProcessor;
$processor->importStyleSheet($xsl);
$result = $processor->transformToXML($sourceXML);
$targetXML = new SimpleXMLElement($result);
echo $targetXML->asXML();

Upvotes: 4

Related Questions