Reputation: 69
I use cURL / file_get_contents
very often to get a page's source code.
However, there is one website where this is not working for me.
Here is the code:
<?php
$c = curl_init('https://plus.nl');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c, CURLOPT_POST, true);
//curl_setopt(... other options you want...)
$html = curl_exec($c);
if (curl_error($c))
die(curl_error($c));
// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);
curl_close($c);
echo $html;
?>
In my browser, it just keeps loading. When I try any other website, it works instantly. What's up with this website that it does not work?
Upvotes: 0
Views: 262
Reputation: 57306
EDIT: Having tried what you're doing, I can actually see the errors in the console. It's much simpler than x-frame-option security. The HTML refers to javascript and css at relative paths to the loaded HTML. In your case, the loaded HTML is coming from your website, not the original plus.nl - and hence all requests for css/javascript/images/etc - all result in 404 (not found).
Original answer (this is not applicable based on my further investigation): Most likely, the answer is with X-FRAME-OPTION
header. The basic html is almost empty; everything else is loaded via javascript. Their X-FRAME-OPTION header only allows the assets to be loaded if the URL in the browser is https:/www.plus.nl/ - and in your case it's not, therefore none of the dynamic stuff can be loaded/executed.
Upvotes: 3
Reputation: 6732
I tried file_get_contents
and it works on the site. However, it's not very usable since the site is detecting the lack of javascript. Setting the useragent with curl didn't do the trick as well.
I'm just getting the message
We werken momenteel aan de website. De huidige pagina werkt nog niet optimaal op mobiel.
which translates to:
We're currently working on the website. The current page isn't optimally working for mobile devices.
So maybe your IP just got banned by them.
Upvotes: 0