user2239695
user2239695

Reputation: 57

PHP - file_get_contents doesn't get all contents (try with curl, same result)

I try to get source code of a webpage.

$urlArena = 'http://arenavision.in/';
$suffixeSchedule = 'schedule';
$url = $urlArena.$suffixeSchedule;
//url = http://arenavision.in/schedule
$text = file_get_contents($url);
$fp = fopen('data.txt', 'w');
$text .= date('d-m-Y h:m:s'); 
fwrite($fp, $text);
fclose($fp);

I write it on a file to be sure of what contains var $text :

<html>
   <head>
        <script type="text/javascript">
            <pre>
              //<![CDATA[
              try{if (!window.CloudFlare) {var CloudFlare=         
           [{verbose:0,p:0,byc:0,owlid:"cf",bag2:1,mirage2:0,oracle:0,paths:{cloudflare:"/cdn-cgi/nexp/dok3v=1613a3a185/"},atok:"aea30972f99dcd729c29d94acbb3cc58",petok:"87f9b51be2424b953e36dd5ec0f8ce1b0f74a3b5-1493799639-1800",zone:"arenavision.in",rocket:"a",apps:{}}];document.write('<script type="text/javascript" src="//ajax.cloudflare.com/cdn-cgi/nexp/dok3v=85b614c0f6/cloudflare.min.js"><'+'\/script>');}}catch(e){};
//]]>
        </script>
        <script type="text/rocketscript">function set_cookie(){var now = new 
            Date();var time = now.getTime();time += 19360000 * 
            1000;now.setTime(time);document.cookie='beget=begetok'+'; 
            expires='+now.toGMTString()+'; path=/';}set_cookie();location.reload();;
        </script>
    </head>
    <body></body>
</html>
03-05-2017 08:05:46
</pre>

Is there a script on the webpage who cancel the function file_get_contents ? Can I avoid it ?

I try with curl, but i get same result. I try with another website (google.com), i was able to get all source code.

Thanks in advance for any help,

G.

Upvotes: 0

Views: 1266

Answers (3)

user2239695
user2239695

Reputation: 57

Thanks to Alex Slipknot and hassan.

Your both explaination help me a lot to understand. So it works :)
Here is my final code :

$url = $urlArena.$suffixeSchedule;
$text = get_data($url);
$fp = fopen('data.txt', 'w');
$text .= date('d-m-Y h:m:s'); 
fwrite($fp, $text);
fclose($fp);

function get_data($url) {
$cookie = get_cookie($url);
if(!isset($cookie) || strlen($cookie) == 0)
{
    debug('error : '.$cookie.' strlen : '.strlen($cookie));
    return false;
}
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

function get_cookie($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
preg_match('/document.cookie=\'([^\']+)\'/',$data,$m);
print_r($m);
return $m[1];
}

Upvotes: 0

hassan
hassan

Reputation: 8308

The web site needs some cookies to fetch your desired page.

here is the scenario :

1) curl the first page http://arenavision.in

2) using regex get this value

document.cookie='beget=begetok'
//               ^^^^^^^^^^^^^

3) send that cookies values to the next request.


Here is a quick example using cURL terminal commands:-

curl 'http://arenavision.in/'

Outputs:

<html><head><script>function set_cookie(){var now = new Date();var time = now.getTime();time += 19360000 * 1000;now.setTime(time);document.cookie='beget=begetok'+'; expires='+now.toGMTString()+'; path=/';}set_cookie();location.reload();;</script></head><body></body></html>

using the value of document.cookie in the next request will do the trick :

curl 'http://arenavision.in/' -H 'Cookie: beget=begetok'

Upvotes: 1

Alex Slipknot
Alex Slipknot

Reputation: 2533

Content on this site generated dynamically. So you can't download full page that you can see in browser.

enter image description here

Anyway site protected by some cloud system. But you can provide cookie in your request to get full page:enter image description here

enter image description here

You have to emulate real user - add cookie in request, accept them before in first response. Use CURL to achieve it

Upvotes: 1

Related Questions