Steven
Steven

Reputation: 19425

file_get_contents() give me 403 Forbidden

I have a partner that has created some content for me to scrape.
I can access the page with my browser, but when trying to user file_get_contents, I get a 403 forbidden.

I've tried using stream_context_create, but that's not helping - it might be because I don't know what should go in there.

1) Is there any way for me to scrape the data?
2) If no, and if partner is not allowed to configure server to allow me access, what can I do then?

The code I've tried using:

$opts = array(
  'http'=>array(
    'user_agent' => 'My company name',
    'method'=>"GET",
    'header'=> implode("\r\n", array(
      'Content-type: text/plain;'
    ))
  )
);

$context = stream_context_create($opts);

//Get header content
$_header = file_get_contents($partner_url,false, $context);

Upvotes: 29

Views: 42329

Answers (4)

Santiago B.
Santiago B.

Reputation: 36

Also if for some reason you're requesting a http resource but that resource lives on your server you can save yourself some trouble if you just include the file as an absolute path.

Like: /home/sally/statusReport/myhtmlfile.html
instead of
https://example.org/myhtmlfile.html

Upvotes: 2

Cleric
Cleric

Reputation: 3193

This is not a problem in your script, its a feature in you partners web server security.

It's hard to say exactly whats blocking you, most likely its some sort of block against scraping. If your partner has access to his web servers setup it might help pinpoint.

What you could do is to "fake a web browser" by setting the user-agent headers so that it imitates a standard web browser.

I would recommend cURL to do this, and it will be easy to find good documentation for doing this.

    // create curl resource
    $ch = curl_init();

    // set url
    curl_setopt($ch, CURLOPT_URL, "example.com");

    //return the transfer as a string
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

    // $output contains the output string
    $output = curl_exec($ch);

    // close curl resource to free up system resources
    curl_close($ch); 

Upvotes: 42

Abid Hussain
Abid Hussain

Reputation: 7762

//set User Agent first

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)'); 

Upvotes: 33

ARIF MAHMUD RANA
ARIF MAHMUD RANA

Reputation: 5166

I have two things in my mind, If you're opening a URI with special characters, such as spaces, you need to encode the URI with urlencode() and A URL can be used as a filename with this function if the fopen wrappers have been enabled.

Upvotes: 0

Related Questions