LINKeRxUA
LINKeRxUA

Reputation: 559

PHP DOMDocument loadhtml. How force not to change markup?

Hi everyone ho read this :) My broblem is with this $dom_doc = new DOMDocument("1.0", "utf-8")->loadHTML($doc)

$doc looks like :

...
<images>
 <img>
   <file>myfile.jpg</file>
   <desc>My file description</desc>
 </img>
 <img>
   <file>myfile.jpg</file>
   <desc>My file description</desc>
 </img>
</images>
...

loadHTML converts this tags into single html tags (img, link etc)

...
<images>
 <img/>
 <file>myfile.jpg</file>
 <desc>My file description</desc>
 <img/>
 <file>myfile.jpg</file>
 <desc>My file description</desc>
</images>
...

What should i do force use paired tags? maybe loadXML(), but it do not want to work with xpath correctly. By selector "//images", shows me that nothing found. So I prefer to use loadHTML()

Upvotes: 1

Views: 299

Answers (1)

ThW
ThW

Reputation: 19502

This is not HTML but XML, If you load it as HTML the DOM parser has to parse it according to HTML rules and that mean for example that img has no closing tag.

I expect you have some namespace definition in you XML. If this is the case you will have to register an prefix for that namespace.

$xml = <<<'XML'
<images xmlns="urn:some-namespace">
 <img>
   <file>myfile.jpg</file>
   <desc>My file description</desc>
 </img>
</images>
XML;

$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('x', 'urn:some-namespace');

foreach ($xpath->evaluate('//x:images/x:img') as $img) {
  var_dump(
    [
      'file' => $xpath->evaluate('string(x:file)', $img),
      'desc' => $xpath->evaluate('string(x:desc)', $img)
    ]
  );
}

Output:

array(2) {
  ["file"]=>
  string(10) "myfile.jpg"
  ["desc"]=>
  string(19) "My file description"
}

Upvotes: 2

Related Questions