Ted Logan
Ted Logan

Reputation: 414

Remove duplicated entries in XLF with PHP

I have an XML file and I want to check with PHP if there are any duplicated entries and remove the unnecessary one. Im running through all trans-units, pushing the id into an array and check if the entry already exist in the array. But how can I remove the trans-unit if I find an already existing id?

My XLF and my PHP Code:

    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">
        <file source-language="de" target-language="de" datatype="plaintext" original="messages" date="2018-08-24T14:49:31Z" product-name="test">
            <header/>
            <body>
                <trans-unit id="test">
                    <source>123</source>
                    <target/>
                </trans-unit>
                <trans-unit id="test2">
                   <source>123</source>
                   <target/>
                </trans-unit>
                <trans-unit id="test2">
                   <source>123</source>
                   <target/>
                </trans-unit>
                <trans-unit id="test3">
                   <source>123</source>
                   <target/>
                </trans-unit>
                <trans-unit id="test4">
                   <source>123</source>
                   <target/>
                </trans-unit>
            </body>
        </file>
    </xliff>


    function cleanUpXliffFile($file) {
        $transUnitIds = [];
        $xlif = simplexml_load_file($file);
        $xlif->file['source-language'] = "de";
        foreach($xlif->file->body->{'trans-unit'} as $item) {
            $unit = $item->attributes()->id;
            $transUnitId = $unit[0]->__toString();
            if(in_array($transUnitId, $transUnitIds)) {
                //DELETE THE CHILD
            }
            $transUnitIds[] = $transUnitId;
            if (!isset($item->target)) {
                $item->addChild("target");
            }

            if ($item->target->__toString() !== "") {
                $item->source = (string)$item->target;
                $item->target[0] = "";
            }
        }

        $xlif->saveXML($file);
    }

Upvotes: 0

Views: 334

Answers (1)

Professor Abronsius
Professor Abronsius

Reputation: 33813

A very simple little function that uses DOMDocument rather than simplexml seems to work OK. Obtain a reference to the trans-unit nodes and add the ID to an array if it does not previously exist and use removeChild to remove duplicated node. This does not do the additional fudging with target attribute.

function cleanXMLFile( $file ){
    $dom=new DOMDocument;
    $dom->load( $file );

    $tmp=[];
    $col=$dom->getElementsByTagName( 'trans-unit' );

    foreach( $col as $node ){
        if( !array_key_exists( $node->getAttribute('id'), $tmp ) ) $tmp[ $node->getAttribute('id') ]=$node;
        else $node->parentNode->removeChild( $node );
    }

    $dom->save( $file );
}

cleanXMLFile( __DIR__ . '/xlf.xml' );

Upvotes: 1

Related Questions