parsecer
parsecer

Reputation: 5106

simple html dom failed to open stream for a site

I'm trying to parse throught http://whatismyip.com page and get my location (state and country). The data seems to be inside <table class="table"> tags, so i'm looking for "table". But I get a mistake Warning: file_get_contents(https://whatismyip.com): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in C:\xampp4\htdocs\scraping\libs\simple_html_dom.php on line 1081

Can't figure out what's wrong.

 <?php
        require_once('libs/simple_html_dom.php');
        $html=new simple_html_dom();

        $html->load_file('https://whatismyip.com');

        $element=$html->find("table");


    ?>

Upvotes: 4

Views: 11139

Answers (3)

Sarang Kartikey
Sarang Kartikey

Reputation: 111

try changing the user agent using below command -

ini_set("user_agent","Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0");

it will work fine then!

Upvotes: 3

basicly your exemple it good but the mistakes here is simple html dom classes not working with Https so try another method

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, "https://whatismyip.com");
curl_setopt($curl, CURLOPT_REFERER, "https://whatismyip.com");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
$str = curl_exec($curl);
curl_close($curl);

and then use your code

    $html->load_file($str);
    $element=$html->find("table");

Edit Adding User-agent to emulate a real navigator (thanks to ShiraNai7)

Upvotes: 5

Shira
Shira

Reputation: 6560

That website is checking the User-Agent header of the request but PHP doesn't send any (by default). You'll have to "impersonate" a browser:

$context = stream_context_create(array(
    'http' => array(
        'header' => array('User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201'),
    ),
));

$html = file_get_contents('http://whatismyip.com/', false, $context);

// do what you want with the $html

A better (and faster) option would be to use some library for this. I've used GeoIP2-php before but I'm sure there are more.

Upvotes: 10

Related Questions