Keith Varias
Keith Varias

Reputation: 85

Extracting Link Text From Specific Links

I'm trying to figure out how can I get only the titles of the movies from this page.

I have this, but I cannot get it to work. Also I don't know much about DomDocument. This currently gets all the links on the page. However, I need to just get the links for the listed movie titles.

$content =  file_get_contents("http://www.imdb.com/movies-in-theaters/");

$dom = new DomDocument();
$dom->loadHTML($content);
$urls = $dom->getElementsByTagName('a');

Upvotes: 1

Views: 56

Answers (1)

kittycat
kittycat

Reputation: 15044

$dom = new DomDocument();
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/');
$urls = $dom->getElementsByTagName('a');
$titles = array();

foreach ($urls as $url)
{
    if ('overview-top' === $url->parentNode->parentNode->getAttribute('class'))
        $titles[] = $url->nodeValue;
}

print_r($titles);

Will output:

Array
(
    [0] =>  Star Trek Into Darkness (2013)
    [1] =>  Frances Ha (2012)
    [2] =>  Stories We Tell (2012)
    [3] =>  Erased (2012)
    [4] =>  The English Teacher (2013)
    [5] =>  Augustine (2012)
    [6] =>  Black Rock (2012)
    [7] =>  State 194 (2012)
    [8] =>  Iron Man 3 (2013)
    [9] =>  The Great Gatsby (2013)
    [10] =>  Pain & Gain (2013)
    [11] =>  Peeples (2013)
    [12] =>  42 (2013)
    [13] =>  Oblivion (2013)
    [14] =>  The Croods (2013)
    [15] =>  The Big Wedding (2013)
    [16] =>  Mud (2012)
    [17] =>  Oz the Great and Powerful (2013)
)

You can use XPath to do this as well, but I don't know it well enough to do it that way.

Upvotes: 2

Related Questions