al-dr
al-dr

Reputation: 147

php DomXPath - how to strip html tags and its contents from nodeValue?

In this code

<root>
    <main>
        <cont>
            <p>hello<a>world</a></p>
            <p>hello</p>
            <p>hello<a>world</a></p>
        </cont>
    </main>
</root>

I just need to get only the text inside <cont> tag. without getting <a> tag and its contents

so, the results will be hello hello hello without world

Upvotes: 0

Views: 557

Answers (2)

Ja͢ck
Ja͢ck

Reputation: 173562

You can select the text nodes that are a direct descendant of each <p> tag:

$dom = new DOMDocument;
$dom->loadXml($xmlData);

$xpath = new DOMXpath($dom);

foreach ($xpath->query('//cont/p/text()') as $text) {
    echo $text->textContent, "\n";
}

Upvotes: 1

user3769335
user3769335

Reputation:

A simplexml_load_string() or simplexml_load_file() should be enough:

$xml_string = '<root> <main> <cont> <p>hello<a>world</a></p> <p>hello</p> <p>hello<a>world</a></p> </cont> </main></root>';
$xml = simplexml_load_string($xml_string);
$p = $xml->main->cont->p;
foreach($p as $value) {
    $parapgraphs[] = (string) $value;
}

echo '<pre>';
print_r($parapgraphs);

Should show something like:

Array
(
    [0] => hello
    [1] => hello
    [2] => hello
)

Upvotes: 1

Related Questions