user3877504
user3877504

Reputation: 23

PHP and DOM: getting text from a child node

I'm trying to get the text data from a child node of an element using PHP and DOM.

Here is the HTML data I'm having trouble parsing. I'm trying to obtain the email address.

<tr>
<th>Engineer:</th>
<td id="contact_person">Jack Smith &lt<a href='mailto:[email protected]'>[email protected]</a>&gt
    <table class='transparent'>
        <tr>
            <td>Work Phone</td>
            <td>(555) 555-5555</td>
        </tr>
    </table>
</td>

Here is my current code for processing that element:

$contact = $dom->getElementById("contact_person")->nodeValue;

This is the result I'm getting:

Jack Smith Work Phone(555) 555-5555

UPDATE: Removing &lt and &gt and replacing with a single hyphen between name and email address returns the following:

Jack Smith - [email protected] Phone(555) 555-5555

This is what I want to get:

[email protected]

I tried to get the developer to move the "id=contact_person" to the anchor that holds the email address. Things work fine when I do that in test, but it is not possible to do in our system.

I'm sure it's apparent, but I'm not really familar with DOM and looking for any guidance...

FINAL UPDATE: THE FIX:

$dom->getElementById("contact_person")->firstChild->nextSibling->nodeValue;

Upvotes: 0

Views: 75

Answers (3)

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324620

It may be more reliable to use an XPath query rather than using firstChild, nextSibling etc.

$xpath = new DOMXPath($dom);
$node = $xpath->query("//*[@id='contact_person']//a[contains(@href,'mailto:')]")->item(0);
if( $node) {
    $email = $node->nodeValue;
}
else {
    $email = "NOT FOUND";
}

This will look for any link containing "mailto", regardless of where it is inside #contact_person. This means that it no longer relies on precise structure, just the container's ID and the fact that it is a mailto link.

Upvotes: 0

user3877504
user3877504

Reputation: 23

This is ultimately what fixed the issue:

$dom->getElementById("contact_person")->firstChild->nextSibling->nodeValue;

Upvotes: 1

bitfiddler
bitfiddler

Reputation: 2115

Try something like:

$contact = $dom->getElementById("contact_person")->firstChild->nodeValue;

Upvotes: 0

Related Questions