Ash Van Wilder
Ash Van Wilder

Reputation: 105

trying to scrape all facebook links from a web page

I'm trying to scrape the page for links from Facebook. However, I get a blank page, without any error message.

My code is as follows:

<?php
error_reporting(E_ALL);

function getFacebook($html) {

    $matches = array();
    if (preg_match('~^https?://(?:www\.)?facebook.com/(.+)/?$~', $html, $matches)) {
        print_r($matches);

    }
}

$html = file_get_contents('http://curvywriter.info/contact-me/');

getFacebook($html);

What's wrong with it?

Upvotes: 0

Views: 183

Answers (1)

Madara&#39;s Ghost
Madara&#39;s Ghost

Reputation: 175017

A better alternative (and more robust) would be to use DOMDocument and DOMXPath:

<?php
error_reporting(E_ALL);

function getFacebook($html) {

    $dom = new DOMDocument;
    @$dom->loadHTML($html);

    $query = new DOMXPath($dom);

    $result = $query->evaluate("(//a|//A)[contains(@href, 'facebook.com')]");

    $return = array();

    foreach ($result as $element) {
        /** @var $element DOMElement */
        $return[] = $element->getAttribute('href');
    }

    return $return;

}

$html = file_get_contents('http://curvywriter.info/contact-me/');

var_dump(getFacebook($html));

For your specific problem, however, I did the following things:

  • Change preg_match to preg_match_all, in order to not stop after the first finding.
  • Removed the ^ (start) and $ (end) characters from the pattern. Your links will appear in the middle of the document, not in the beginning or end (definitely not both!)

So the corrected code:

<?php
error_reporting(E_ALL);

function getFacebook($html) {

    $matches = array();
    if (preg_match_all('~https?://(?:www\.)?facebook.com/(.+)/?~', $html, $matches)) {
        print_r($matches);

    }
}

$html = file_get_contents('http://curvywriter.info/contact-me/');

getFacebook($html);

Upvotes: 1

Related Questions