Reputation: 3011
I'm trying to wrap my mind around some PHP web scraping using cURL. I recently picked up a short book on the topic, but am stuck on one of the tutorials and can't seem to find where the error is. The cookie.txt file is created, so I know that at least one portion of the function is executing properly.
I've tried using both the id and name attributes of the name and password input
fields without any luck. As far as I can tell, I'm also using the correct POST url.
<?php
// Function to submit form using cURL POST method
function curlPost($postUrl, $postFields, $successString) {
$useragent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'; // Setting using agent of a very old, yet popular browser.
$cookie = 'cookie.txt'; //Setting a cookie file to store cookie
$ch = curl_init(); // Intializing cURL session
// Setting cURL options
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // Prevent cURL from verifying SSL certificate
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE); // Script should fail silently on error
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE); // Use cookies
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow Location: headers
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Reutrning transfer as a string
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); // Setting cookiefile
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); // Setting cookiejar
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // Setting useragent
curl_setopt($ch, CURLOPT_URL, $postUrl); // Setting URL to POST
curl_setopt($ch, CURLOPT_POST, TRUE); // Setting method as POST
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postFields)); // Setting POST fields as array
$results = curl_exec($ch); // Executing cURL session
curl_close($ch); // Closing cURL session
// Checking if login was successful by checking existence of string
if (strpos($results, $successString)) {
return $results;
} else {
return FALSE;
}
}
$userEmail = '[email protected]'; // Setting your email address for site login
$userPass = 'password'; // Setting your password for site login
$postUrl = 'https://www.packtpub.com/'; // Setting URL to POST to
// Setting form input fields as 'name' => 'value'
$postFields = array (
'name' => $userEmail,
'password' => $userPass,
'form_id' => 'packt-login-form-header'
);
$successString = 'You are logged in as';
$loggedIn = curlPost($postUrl, $postFields, $successString); // Executing curlPost login and storing results page in $loggedIn
?>
Upvotes: 0
Views: 267
Reputation: 478
I've tested the script under Linux and it works as expected, with two minimal corrections:
First as hindmost mentioned, the path for the coockie-file has to be absolute. You can either provide the full path or use something like this:
$cookie = dirname(__FILE__).'/cookie.txt';
OR
$cookie = __DIR__.'/cookie.txt'; // if PHP Version > 5.3.0
This will insert the directory dynamically from the path of your file in which the function is declared.
Second you have to do “something” with the content of the $loggedIn variable to see any effect and for further debugging. You could for example use this code at the end of your script:
var_dump($loggedIn);
This will echo “bool(false)” on ERROR or the content of the request as in the variable $results from that function.
Upvotes: 1