Reputation: 21531
This is very strange, on some pages it will return the HTML fine, others it will add numbers to the beginning and end of the returned string ($out
).
function lookupPage($page, $return = true) {
$fp = fsockopen("127.0.0.1", 48580, $errno, $errstr, 5);
if (!$fp) {
return false;
}
else {
$out = "";
$headers = "GET /" . $page . " HTTP/1.1\r\n";
$headers .= "Host: www.site.com\r\n";
$headers .= "Connection: Close\r\n\r\n";
fwrite($fp, $headers);
stream_set_timeout($fp, 300);
$info = stream_get_meta_data($fp);
while (!feof($fp) && !$info['timed_out'] && ($line = stream_get_line($fp, 1024)) !== false) {
$info = stream_get_meta_data($fp);
if ($return) $out .= $line;
}
fclose($fp);
if (!$info['timed_out']) {
if ($return) {
$out = substr($out, strpos($out, "\r\n\r\n") + 4);
return $out;
}
else {
return true;
}
}
else {
return false;
}
}
}
e.g...
3565
<html>
<head>
...
</html>
0
Upvotes: 1
Views: 383
Reputation: 12737
My guess would be that the server responds with chunked data.
Have a look at RFC2616 Transfer codings and its introduction.
Upvotes: 0
Reputation: 1022
It is called Chunked Transfer Encoding
It is part of the HTTP 1.1 protocol and you're decoding it in a HTTP 1.0 way. You can just check for the values and trim them if you want. They only show the length of the response so the browser knows it has the complete response.
Also maybe look at file_get_contents
Upvotes: 2