Reputation: 2421
I'm using DOMDocument and XPath.
Given to following XML
<Description>
<CompleteText>
<DetailTxt>
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br/>
<span>Normal position</span>
<br/>
<span> </span>
<br/>
</Text>
</DetailTxt>
</CompleteText>
</Description>
The node /Description/CompleteText/DetailTxt/Text
contains markup, unfortunately unescaped, but I can't change that. Is there any chance I can query that content maintaining the html markup?
Obviously, nodeValue but also textContent. Both giving me the content omitting markup.
Upvotes: 1
Views: 241
Reputation: 2421
I find a good result with using the C14n method of DOMNode.
http://sandbox.onlinephpfunctions.com/code/90dc915c9a43c91d31fcd47d37e89df430951b2e
<?php
$xml = <<<'EOD'
<Description>
<CompleteText>
<DetailTxt>
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br/>
<span>Normal position</span>
<br/>
<span> </span>
<br/>
</Text>
</DetailTxt>
</CompleteText>
</Description>
EOD;
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
function innerHTML($nodeList) {
$node = $nodeList[0];
$html = '';
$containingDoc = $node->ownerDocument;
foreach ($node->childNodes as $child) {
$html .= $containingDoc->saveHTML($child);
}
return $html;
}
$xpath->registerNamespace("php", "http://php.net/xpath");
$domNodes = $xpath->query('/Description/CompleteText/DetailTxt/Text');
$domNode = $domNodes[0];
$innerHTML = $domNode->C14N();
echo $innerHTML;
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br></br>
<span>Normal position</span>
<br></br>
<span> </span>
<br></br>
</Text>
Seems shorter in a way, what do you think? I would need to get rid of node though. Thanks also for pointing me to PHP Sandbox.
I realize, C14N() changes the markup. See <br />
to <br></br>
.
Upvotes: 0
Reputation: 167716
You can use the saveHTML
method of DOMDocument
to serialize a node as HTML, in your case you seem to want to call it on each child node of the selected node and concatenate the strings; in the browser DOM APIs that would be called innerHTML
so I have written a function of that name doing that and also used the ability to call PHP functions from XPath in the following snippet:
<?php
$xml = <<<'EOD'
<Description>
<CompleteText>
<DetailTxt>
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br/>
<span>Normal position</span>
<br/>
<span> </span>
<br/>
</Text>
</DetailTxt>
</CompleteText>
</Description>
EOD;
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
function innerHTML($nodeList) {
$node = $nodeList[0];
$html = '';
$containingDoc = $node->ownerDocument;
foreach ($node->childNodes as $child) {
$html .= $containingDoc->saveHTML($child);
}
return $html;
}
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("innerHTML");
$innerHTML = $xpath->evaluate('php:function("innerHTML", /Description/CompleteText/DetailTxt/Text)');
echo $innerHTML;
Output as http://sandbox.onlinephpfunctions.com/code/62a980e2d2a2485c2648e16fc647a6bd6ff5620b is
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br>
<span>Normal position</span>
<br>
<span> </span>
<br>
Upvotes: 1