Nikolaj
Nikolaj

Reputation: 171

Getting the text portion of a node using php Simple XML

Given the php code:

$xml = <<<EOF
<articles>
<article>
This is a link
<link>Title</link>
with some text following it.
</article>
</articles>
EOF;

function traverse($xml) {
    $result = "";
    foreach($xml->children() as $x) {
        if ($x->count()) {
            $result .= traverse($x);
        }
        else {
            $result .= $x;
        }
    }
    return $result;
}

$parser = new SimpleXMLElement($xml);
traverse($parser);

I expected the function traverse() to return:

This is a link Title with some text following it.

However, it returns only:

Title

Is there a way to get the expected result using simpleXML (obviously for the purpose of consuming the data rather than just returning it as in this simple example)?

Upvotes: 8

Views: 18306

Answers (7)

Dan Jones
Dan Jones

Reputation: 1440

Try this:

$parser = new SimpleXMLElement($xml);
echo html_entity_decode(strip_tags($parser->asXML()));

That's pretty much equivalent to:

$parser = simplexml_load_string($xml);
echo dom_import_simplexml($parser)->textContent;

Upvotes: 1

Rimer
Rimer

Reputation: 2074

This has already been answered, but CASTING TO STRING ( i.e. $sString = (string) oSimpleXMLNode->TagName) always worked for me.

Upvotes: 1

Rbista
Rbista

Reputation: 51

node->asXML();// It's the simple solution i think !!

Upvotes: 5

Nikolaj
Nikolaj

Reputation: 171

So, the simple answer to my question was: Simplexml can't process this kind of XML. Use DomDocument instead.

This example shows how to traverse the entire XML. It seems that DomDocument will work with any XML whereas SimpleXML requires the XML to be simple.

function attrs($list) {
    $result = "";
    foreach ($list as $attr) {
        $result .= " $attr->name='$attr->value'";
    }
    return $result;
}

function parseTree($xml) {
    $result = "";
    foreach ($xml->childNodes AS $item) {
        if ($item->nodeType == 1) {
            $result .= "<$item->nodeName" . attrs($item->attributes) . ">" . parseTree($item) . "</$item->nodeName>";
        }
        else {
            $result .= $item->nodeValue;
        }
    }
    return $result;
}

$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($xml);

print parseTree($xmlDoc->documentElement);

You could also load the xml using simpleXML and then convert it to DOM using dom_import_simplexml() as Josh said. This would be useful, if you are using simpleXml to filter nodes for parsing, e.g. using XPath.

However, I don't actually use simpleXML, so for me that would be taking the long way around.

$simpleXml = new SimpleXMLElement($xml);
$xmlDom = dom_import_simplexml($simpleXml);

print parseTree($xmlDom);

Thank you for all the help!

Upvotes: 4

Josh Davis
Josh Davis

Reputation: 28730

There might be ways to achieve what you want using only SimpleXML, but in this case, the simplest way to do it is to use DOM. The good news is if you're already using SimpleXML, you don't have to change anything as DOM and SimpleXML are basically interchangeable:

// either
$articles = simplexml_load_string($xml);
echo dom_import_simplexml($articles)->textContent;

// or
$dom = new DOMDocument;
$dom->loadXML($xml);
echo $dom->documentElement->textContent;

Assuming your task is to iterate over each <article/> and get its content, your code will look like

$articles = simplexml_load_string($xml);
foreach ($articles->article as $article)
{
    $articleText = dom_import_simplexml($article)->textContent;
}

Upvotes: 17

mailo
mailo

Reputation: 2611

Like @tandu said, it's not possible, but if you can modify your XML, this will work:

$xml = <<<EOF
<articles>
    <article>
        This is a link
    </article>
    <link>Title</link>
    <article>
       with some text following it.
    </article>
</articles>

Upvotes: 0

Explosion Pills
Explosion Pills

Reputation: 191809

You can get the text node of a DOM element with simplexml just by treating it like a string:

foreach($xml->children() as $x) {
   $result .= "$x"

However, this prints out:

This is a link

with some text following it.
TitleTitle

..because the text node is treated as one block and there is no way to tell where the child fits in inside the text node. The child node is also added twice because of the other else {}, but you can just take that out.

Sorry if I didn't help much, but I don't think there's any way to find out where the child node fits in the text node unless the xml is consistent (but then, why not use tags). If you know what element you want to strip the text out of, strip_tags() will work great.

Upvotes: 1

Related Questions