Shawn
Shawn

Reputation: 941

parsing html through get_file_contents()

is have been told that the best way to parse html is through DOM like this:

<?

$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);

$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
    echo $el->nodeValue . "\n";
}


?>

but in the above the variable $html can't be a url, or can it?? wouldnt i have to use to function get_file_contents() to get the html of a page?

Upvotes: 0

Views: 289

Answers (3)

sooper
sooper

Reputation: 6039

If you're having trouble using DOM, you could use CURL to parse. For example:

$url = "http://www.davesdaily.com/";

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_URL, $url);
$input = curl_exec($curl);

$regexp = "<span class=comment>([^<]*)<\/span>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
  foreach($matches as $match);
}
  echo $match[0];

The script should grab the text between <span class=comment> and </span> and store inside an array $match. This should echo Entertainment.

Upvotes: -1

Saxoier
Saxoier

Reputation: 1287

You have to use DOMDocument::loadHTMLFile to load HTML from an URL.

$doc = new DOMDocument();
$doc->loadHTMLFile($path);

DOMDocument::loadHTML parses a string of HTML.

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($path));

Upvotes: 1

Marc B
Marc B

Reputation: 360572

It can be, but it depends on allow_url_fopen being enabled in your PHP install. Basically all of the PHP file-based functions can accept a URL as a source (or destination). Whether such a URL makes sense is up to what you're trying to do.

e.g. doing file_put_contents('http://google.com') is not going to work, as you'd be attempting to do an HTTP upload to google, and they're not going allow you to replace their homepage...

but doing $dom->loadHTML('http://google.com'); would work, and would suck in google's homepage into DOM for processing.

Upvotes: 0

Related Questions