Eaten Taik
Eaten Taik

Reputation: 938

get_meta_tags() throwing error failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden

I'm trying to get meta data from a website URL using get_meta_tags() function. Most URL that I inserted are working fine but there is this 1 URL throwing the error failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden.

I was wondering if there is a way I can get through with the permission? If no, is there any way I can detect if the specific website can be accessed or not? At least I can do something to work it out without having the error showing up cause I need to get some information from meta data.

My code is simply putting like this:

get_meta_tags("https://www.udemy.com/course/beginning-c-plus-plus-programming/");

Upvotes: 1

Views: 2046

Answers (2)

widi baka
widi baka

Reputation: 1

You can use cURL

function url_get_contents($url, $useragent='cURL', $headers=false, $follow_redirects=true, $debug=false) {

    // initialise the CURL library
    $ch = curl_init();

    // specify the URL to be retrieved
    curl_setopt($ch, CURLOPT_URL,$url);

    // we want to get the contents of the URL and store it in a variable
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

    // specify the useragent: this is a required courtesy to site owners
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

    // ignore SSL errors
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    // return headers as requested
    if ($headers==true){
        curl_setopt($ch, CURLOPT_HEADER,1);
    }

    // only return headers
    if ($headers=='headers only') {
        curl_setopt($ch, CURLOPT_NOBODY ,1);
    }

    // follow redirects - note this is disabled by default in most PHP installs from 4.4.4 up
    if ($follow_redirects==true) {
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    }

    // if debugging, return an array with CURL's debug info and the URL contents
    if ($debug==true) {
        $result['contents']=curl_exec($ch);
        $result['info']=curl_getinfo($ch);
    }

    // otherwise just return the contents as a variable
    else $result=curl_exec($ch);

    // free resources
    curl_close($ch);

    // send back the data
    return $result;
}

Upvotes: 0

freeek
freeek

Reputation: 978

It looks like site blocks PHP scripts to prevent scraping.

You can try to make site think that it is accessed by a human (Web browser).

You can change the User-Agent header during the request using stream_context_create():

$context = stream_context_create(
    array(
        "http" => array(
            "header" => "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
        )
    )
);

$tags = get_meta_tags(file_get_contents('https://www.udemy.com/course/beginning-c-plus-plus-programming/', false, $context));
var_dump($tags)

Here you can find the list of most common user agents.

P.S. Keep in mind this is not really fair.

Upvotes: 0

Related Questions