mending3
mending3

Reputation: 712

Increase performance of PHP DOM-XML. Currently takes too long time

I have an array which contains 7000+ value

$arrayIds = [
    'A001',
    ...,
    'A7500'
];

This foreach loop gets node value inside a node in a given XML file

$dom = new DOMDocument;
$dom->load('myxml.xml');

$xp = new DOMXPath($dom);

$data = [];

foreach ($arrayIds as $arrayId) {
    $expression = "//unit[@person-name=\"$arrayId\"]/@id";
    $col = $xp->query($expression);

    if ($col && $col->length) {
        foreach ($col as $node) {
            $data[] = $node->nodeValue;
        }
    }
}

It takes approximately 70 seconds. I can't wait any longer than 5 seconds

What is the fastest way to achieve this?

Segment of the XML file:

<unit person-name="A695" id="PTU-300" xml:space="preserve">
    <source xml:lang="en">Related tutorials</source>
    <seg-source><mrk mid="0" mtype="seg">Related tutorials</mrk></seg-source>
    <target xml:lang="id"><mrk mid="0" mtype="seg">Related tutorials</mrk></target>
</unit>
<unit person-name="A001" id="PTU-4" xml:space="preserve">
    <source xml:lang="en">Related tutorials</source>
    <seg-source><mrk mid="0" mtype="seg">Related tutorials</mrk></seg-source>
    <target xml:lang="id"><mrk mid="0" mtype="seg">Related tutorials</mrk></target>
</unit>
...
<unit>
...
</unit>

Anyway, I'm doing this on an M1 Mac

Upvotes: 1

Views: 185

Answers (1)

Nigel Ren
Nigel Ren

Reputation: 57121

I think the problem is the way you use XPath to find an element. Each time you run it for each name, it will search the whole document, even if it's the first item. This is because it could find multiple values and doesn't know to stop after finding the first.

Alternatively, this uses XPath to find all of the names and checks each one if it is in the list of names you are looking for. If so, it extracts the id and adds it to the list.

It's difficult to test how long this will take, but it's easier for you to test than me...

$data = array_fill_keys($arrayIds, null);
$arrayIds = array_flip($arrayIds);
$expression = "//unit/@person-name";
$cols = $xp->query($expression);
foreach ($cols as $col) {
    if (isset($arrayIds[$col->nodeValue])) {
        $parent = $col->parentNode;
        $data[$col->nodeValue] =$parent->attributes->getNamedItem("id")->nodeValue;
    }
}

Using array_flip() makes the names to search for as the index, so isset() can be used rather than doing a search of the array.

I've added the name as the key to the output array, so you get something like...

Array
(
    [A001] => 4
)

Upvotes: 1

Related Questions