Jenson M John
Jenson M John

Reputation: 5689

How to use file_get_contents() with non-English symbols in URL?

I'm getting this error when I try to access non-English (Unicode) URLs using PHP's file_get_contents() function. The URL was: http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5%86%E0%B4%B8%E0%B5%8D%E0%B4%B8%E0%B4%BF

I've got this error:

Warning: file_get_contents(http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5%86%E0%B4%B8%E0%B5%8D%E0%B4%B8%E0%B4%BF) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden..

Fatal error: Call to a member function find() on a non-object in G:\xampp\htdocs\codes\htmlParse1.php on line 8

Is there any restriction for the file_get_contents() function? Does it only accept English URLs?

Upvotes: 4

Views: 1454

Answers (2)

Baba
Baba

Reputation: 95103

You are missing header information like user agent. I would advice you just use Just use curl

$url = 'http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5%86%E0%B4%B8%E0%B5%8D%E0%B4%B8%E0%B4%BF';
$ch = curl_init($url); // initialize curl handle
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17");
curl_setopt($ch, CURLOPT_REFERER, "http://ml.wikipedia.org");
curl_setopt($ch, CURLOPT_ENCODING, "UTF-8");
$data = curl_exec($ch);
print($data);

Live CURL Demo

If you must use file_get_content

$options = array(
        'http'=>array(
                'method'=>"GET",
                'header'=>"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" .
                "Cookie: centralnotice_bucket=0-4.2; clicktracking-session=M7EcNiC2Zcuko7exVGUvLfdwxzSK3Boap; narayam-scheme=ml\r\n" . 
                "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17"
        )
);


$url = 'http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5%86%E0%B4%B8%E0%B5%8D%E0%B4%B8%E0%B4%BF';
$context = stream_context_create($options);
$file = file_get_contents($url, false, $context);
echo $file ;

Live file_get_content Demo

Upvotes: 3

ConcurrentHashMap
ConcurrentHashMap

Reputation: 5084

If there is a 403 Forbidden, the connection should work. That's just a warning, that the webserver responded with the status code 403. Wikipedia denies downloading without valid user agent:

Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.

The second error should be from the next lines that are handling the result (a String object) of your file_get_contents(...) call.

Edit: You should try setting your user agent with e.g. ini_set('user_agent', 'wikiPHP'); before doing the request. That should work fine.

Upvotes: 1

Related Questions