hemnath mouli
hemnath mouli

Reputation: 2755

PHP DOM: get attribute with DOM

I am using PHPDocument and DOMXPath. I am trying to get the attribute with json type value. But I don't get the exact value. I could get the other attributes well but not this. The HTML looks like

<a href="URL" title="{tt4438848=Nicholas Stoller (dir.), Seth Rogen, Rose Byrne, tt2567026=James Bobin (dir.), Mia Wasikowska, Johnny Depp, tt3498820=Anthony Russo (dir.), Chris Evans, Robert Downey Jr., tt2948356=Byron Howard (dir.), Ginnifer Goodwin, Jason Bateman, tt3385516=Bryan Singer (dir.), James McAvoy, Michael Fassbender, tt1985949=Clay Kaytis (dir.), Jason Sudeikis, Josh Gad, tt3068194=Whit Stillman (dir.), Kate Beckinsale, Chloë Sevigny, tt3799694=Shane Black (dir.), Russell Crowe, Ryan Gosling, tt3040964=Jon Favreau (dir.), Neel Sethi, Bill Murray, tt2241351=Jodie Foster (dir.), George Clooney, Julia Roberts}">X-Men: Apocalypse</a>

If I use echo $dom->getAttribute("href"); the output is URL
If I use echo $dom->getAttribute("title"); the output is Bryan Singer (dir.), James McAvoy, Michael Fassbender

I cannot get the exact attribute value.

Edit link phpfiddle.org/main/code/dvj5-zf0q

Can anyone help?? I am new to PHPDOM. Thanks in advance

Upvotes: 0

Views: 51

Answers (1)

Jan
Jan

Reputation: 43199

To get the title attribute:

<?php
$html = <<<EOF
<html>
<a href="URL" title="{tt4438848=Nicholas Stoller (dir.), Seth Rogen, Rose Byrne, tt2567026=James Bobin (dir.), Mia Wasikowska, Johnny Depp, tt3498820=Anthony Russo (dir.), Chris Evans, Robert Downey Jr., tt2948356=Byron Howard (dir.), Ginnifer Goodwin, Jason Bateman, tt3385516=Bryan Singer (dir.), James McAvoy, Michael Fassbender, tt1985949=Clay Kaytis (dir.), Jason Sudeikis, Josh Gad, tt3068194=Whit Stillman (dir.), Kate Beckinsale, Chloë Sevigny, tt3799694=Shane Black (dir.), Russell Crowe, Ryan Gosling, tt3040964=Jon Favreau (dir.), Neel Sethi, Bill Murray, tt2241351=Jodie Foster (dir.), George Clooney, Julia Roberts}">X-Men: Apocalypse</a>
</html>
EOF;

$dom = new DOMDocument();
$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
    $title = $link->getAttribute('title');
    echo $title;
}
?>

Be aware though that the title does not hold a json string but some custom implementation.
See a demo on ideone.com.


To actually get the information, you could use some regular expressions like so:

\w+=((?:(?!(?:, tt)).)+)

Broken down to your problem this would be:

$regex = '~\w+=((?:(?!(?:, tt)).)+)~';
foreach ($links as $link) {
    preg_match_all($regex, $link->getAttribute('title'), $actors);
    print_r($actors);
}

See a demo for this one on ideone.com as well.

Upvotes: 2

Related Questions