Reputation: 57
I try to get source code of a webpage.
$urlArena = 'http://arenavision.in/';
$suffixeSchedule = 'schedule';
$url = $urlArena.$suffixeSchedule;
//url = http://arenavision.in/schedule
$text = file_get_contents($url);
$fp = fopen('data.txt', 'w');
$text .= date('d-m-Y h:m:s');
fwrite($fp, $text);
fclose($fp);
I write it on a file to be sure of what contains var $text :
<html>
<head>
<script type="text/javascript">
<pre>
//<![CDATA[
try{if (!window.CloudFlare) {var CloudFlare=
[{verbose:0,p:0,byc:0,owlid:"cf",bag2:1,mirage2:0,oracle:0,paths:{cloudflare:"/cdn-cgi/nexp/dok3v=1613a3a185/"},atok:"aea30972f99dcd729c29d94acbb3cc58",petok:"87f9b51be2424b953e36dd5ec0f8ce1b0f74a3b5-1493799639-1800",zone:"arenavision.in",rocket:"a",apps:{}}];document.write('<script type="text/javascript" src="//ajax.cloudflare.com/cdn-cgi/nexp/dok3v=85b614c0f6/cloudflare.min.js"><'+'\/script>');}}catch(e){};
//]]>
</script>
<script type="text/rocketscript">function set_cookie(){var now = new
Date();var time = now.getTime();time += 19360000 *
1000;now.setTime(time);document.cookie='beget=begetok'+';
expires='+now.toGMTString()+'; path=/';}set_cookie();location.reload();;
</script>
</head>
<body></body>
</html>
03-05-2017 08:05:46
</pre>
Is there a script on the webpage who cancel the function file_get_contents ? Can I avoid it ?
I try with curl, but i get same result. I try with another website (google.com), i was able to get all source code.
Thanks in advance for any help,
G.
Upvotes: 0
Views: 1266
Reputation: 57
Thanks to Alex Slipknot and hassan.
Your both explaination help me a lot to understand.
So it works :)
Here is my final code :
$url = $urlArena.$suffixeSchedule;
$text = get_data($url);
$fp = fopen('data.txt', 'w');
$text .= date('d-m-Y h:m:s');
fwrite($fp, $text);
fclose($fp);
function get_data($url) {
$cookie = get_cookie($url);
if(!isset($cookie) || strlen($cookie) == 0)
{
debug('error : '.$cookie.' strlen : '.strlen($cookie));
return false;
}
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
function get_cookie($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
preg_match('/document.cookie=\'([^\']+)\'/',$data,$m);
print_r($m);
return $m[1];
}
Upvotes: 0
Reputation: 8308
The web site needs some cookies to fetch your desired page.
here is the scenario :
1) curl the first page http://arenavision.in
2) using regex get this value
document.cookie='beget=begetok'
// ^^^^^^^^^^^^^
3) send that cookies values to the next request.
Here is a quick example using cURL
terminal commands:-
curl 'http://arenavision.in/'
Outputs:
<html><head><script>function set_cookie(){var now = new Date();var time = now.getTime();time += 19360000 * 1000;now.setTime(time);document.cookie='beget=begetok'+'; expires='+now.toGMTString()+'; path=/';}set_cookie();location.reload();;</script></head><body></body></html>
using the value of document.cookie
in the next request will do the trick :
curl 'http://arenavision.in/' -H 'Cookie: beget=begetok'
Upvotes: 1
Reputation: 2533
Content on this site generated dynamically. So you can't download full page that you can see in browser.
Anyway site protected by some cloud system. But you can provide cookie in your request to get full page:
You have to emulate real user - add cookie in request, accept them before in first response. Use CURL to achieve it
Upvotes: 1