Extracting description out of an html page

Question

I am trying to extract title and description out of web pages, using DOMdocument(), I am successful in extracting title like this

$d=new DOMDocument();
$d->loadHTML($html);
$title=$d->getElementsByTagName("title")->item(0)->textContent;

I can extract the description by looping through all meta tags and checking for the name="desctiption"attribute but looping makes the process slow so wanted to know if there can be a direct method for extracting content using some attribute selector in php DOMdocument??

hellsgate · Accepted Answer

I don't think this can be done with DOMDocument alone, but it is possible in combination with with DOMXPath:

$html = '


Dom - Xpath test





This is the test HTML


';

$dom = new DOMDocument();
$dom->loadHTML($html);
$domx = new DOMXPath($dom);
$desc = $domx->query("//meta[@name='description']");

$i = 0;
while ($item = $desc->item($i++)) {
    echo ''.$item->getAttribute('content').'';
}

Extracting description out of an html page

Answers (2)

Related Questions