Reputation: 105
I'm trying to scrape the page for links from Facebook. However, I get a blank page, without any error message.
My code is as follows:
<?php
error_reporting(E_ALL);
function getFacebook($html) {
$matches = array();
if (preg_match('~^https?://(?:www\.)?facebook.com/(.+)/?$~', $html, $matches)) {
print_r($matches);
}
}
$html = file_get_contents('http://curvywriter.info/contact-me/');
getFacebook($html);
What's wrong with it?
Upvotes: 0
Views: 183
Reputation: 175017
A better alternative (and more robust) would be to use DOMDocument and DOMXPath:
<?php
error_reporting(E_ALL);
function getFacebook($html) {
$dom = new DOMDocument;
@$dom->loadHTML($html);
$query = new DOMXPath($dom);
$result = $query->evaluate("(//a|//A)[contains(@href, 'facebook.com')]");
$return = array();
foreach ($result as $element) {
/** @var $element DOMElement */
$return[] = $element->getAttribute('href');
}
return $return;
}
$html = file_get_contents('http://curvywriter.info/contact-me/');
var_dump(getFacebook($html));
For your specific problem, however, I did the following things:
preg_match
to preg_match_all
, in order to not stop after the first finding.^
(start) and $
(end) characters from the pattern. Your links will appear in the middle of the document, not in the beginning or end (definitely not both!)So the corrected code:
<?php
error_reporting(E_ALL);
function getFacebook($html) {
$matches = array();
if (preg_match_all('~https?://(?:www\.)?facebook.com/(.+)/?~', $html, $matches)) {
print_r($matches);
}
}
$html = file_get_contents('http://curvywriter.info/contact-me/');
getFacebook($html);
Upvotes: 1