Cesarg2199
Cesarg2199

Reputation: 579

PHP cURL failed to load response data

I am attempting to do data scraping with php but the url I need to access requires post data.

<?php 

//set POST variables
$url = 'https://www.ncaa.org/';
//$url = 'https://web3.ncaa.org/hsportal/exec/hsAction?hsActionSubmit=searchHighSchool';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

?>

When I tried accessing the second $url where the actual information is hosted it returns failed to load response data, but It will allow me to access the ncaa home page. Is there a reason why I get a failed to load response data even though I am sending the correct form data?

Upvotes: 0

Views: 858

Answers (2)

Andy  Ryu
Andy Ryu

Reputation: 79

curl HTTPS connections needs to turn off specical option. CURLOPT_SSL_VERIFYPEER

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// ** This option MUST BE FALSE **
**curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);**

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

Upvotes: 0

Barmar
Barmar

Reputation: 781058

The site apparently checks for a recognized user agent. By default PHP curl doesn't send a User-Agent header. Add

curl_setopt($curl, CURLOPT_USERAGENT, 'curl/7.21.4');

and the script returns a response. However, in this case, the response says that it requires a newer browser than the one you have. So you should copy the user agent string from a real browser, e.g.

curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');

Also, it requires the parameters to be sent in application/x-www-form-urlencoded format. When you use an array as the argument to CURLOPT_POSTFIELDS it uses multipart/form-data. So change that line to:

curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));

to convert the array to a URL-encoded string.

And in the URL, leave out ?hsActionSubmit=searchHighSchool, as that parameter is sent in the POST fields.

The final, working script looks like this:

<?php
//set POST variables
//$url = 'https://www.ncaa.org/';
$url = 'https://web3.ncaa.org/hsportal/exec/hsAction';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));
curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');
//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

Upvotes: 1

Related Questions