Reputation: 2713
$curl = curl_init("http://example.com/");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($curl, CURLOPT_HTTPHEADER, array("Host: example.com",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.8"));
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
$result = curl_exec ($curl);
echo $result;
The response is
<html><title>You are being redirected...</title>
<noscript>Javascript is required. Please enable javascript before you are allowed to see this page.</noscript>
I'm reusing the headers exactly as the browser is sending to the site.
How can a site know this is not a real browser? The error occurs when loading the main page so it's not like there is any authentication going on.
In fact, Javascript is not even needed for the majority of the page's content. I can it's loaded as standard html, but for some reason if not enabled the entire page doesn't load.
Any ideas? (sorry, can't share real site name).
Upvotes: 1
Views: 9162
Reputation: 5228
I have the same problem years later. Some old websites' security s****d minds create bogus security by hiding PHP form submission in complicated JS files. The actual URL displayed on the browser/form is not the URL you actually post to. The real URL is hidden in JS files.
Open page source and look into JS files.
Upvotes: 0
Reputation: 7911
To my knowledge, the mininum of 2 requests is needed to know if a client has JavaScript enabled or not. Since this is CURL, and can be setup as an "original" request the response would not make any sense unless that website checks request headers like a hound dog.
As @zerkms mentioned, chrome does send more headers then your CURL request:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,nl;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Cookie:cookiedata
DNT:1
Host:example.com
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36
There are a couple of mismatches, Host:example.com
does not has a space. Secondly, curl would take care of that with the curl_init()
function. I'm also missing DNT, cache-control, Accept-Encoding/Languages.
In theory, a server cannot detect client settings but it can very well detect every header.
If for example I would build this software, I would accumulate enough data to detect normal browser headers. If data is missing I could detect if it is a real user request or not.
Upvotes: 1
Reputation: 4694
The site likely actually can't tell that it's not a browser making the request. The HTML <noscript>
tag marks content that should be shown if and only if JavaScript is enabled. The reason it would seem to not be loading is because the remote server appears to have sent you a meta-refresh/redirect page; the solution as I could see it is to send the same request wherever you're being redirected to.
Aside from that, however, there are in fact ways for a server to tell what's sending a request: the User-Agent
heading. This heading is typically hardcoded on most browsers and sent with every request; it contains information on what the client is. Not completely reliable (it can be spoofed, which is what you're doing), but at least it's something.
Upvotes: 0