Reputation: 5550
I am learning PHP, and I'm trying to make an application that has a relationship with an external website.
I need to download it.
So I got this code:
$str = file_get_contents($url);
Which should return me the HTML contents of a website.
it works fine for most websites, but for a particular one - http://www.fxp.co.il - it shows crap.
What is the problem ? What can I do to fix it ?
Thank you !
Upvotes: 1
Views: 137
Reputation: 197767
Well, you should actually inspect the response headers as they tell you about the encoding of the data returned file_get_contents
.
For example, if it's gzip encoded, you need to uncompress it.
Normally you won't notice that because file_get_contents()
sends a request in a way that the server knows that it does not support compression.
However some servers just do not care and send you compressed responses anyway:
<?php
$url = 'http://www.fxp.co.il/';
$buffer = file_get_contents($url);
echo $url, '<hr>', '<pre>', implode("\n", $http_response_header), '</pre>';
$bare = gzdecode($buffer);
echo '<hr>', htmlspecialchars(substr($bare, 0, 256));
Output:
http://www.fxp.co.il/
------------------------------------------------------------
HTTP/1.1 200 OK
Server: nginx/0.7.67
Date: Mon, 29 Aug 2011 19:19:55 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Set-Cookie: bb_lastvisit=1314607056; expires=Tue, 28-Aug-2012 19:12:44 GMT; path=/
Set-Cookie: bb_lastactivity=0; expires=Tue, 28-Aug-2012 19:12:44 GMT; path=/
X-Accel-Expires: 600
Cache-control: must-revalidate, post-check=0, pre-check=0
Pragma: cache
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 14170
Expires: Tue, 24 Jan 1984 08:00:00 GMT
X-Header: Boost Citrus 1.9
Cache-Control: must-revalidate, post-check=0, pre-check=0
------------------------------------------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" dir="rtl" lang="he"> <head> <meta http-equiv="Content-Type" content="text/html; charset
Take care!
Upvotes: 2