user3193843
user3193843

Reputation: 295

Log into website using curl failing

I am trying to login into to a remote site using curl. ( before doing some data scraping)

Using the following code I am producing a cookies.txt file that has the following:

# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

#HttpOnly_www.xxx.com   FALSE   /   TRUE    0   xxxv5   h_r4hXtn-gNAilZwhvHjYdE3Vr4HewhxtGrxja57LbW03-M9MLNqZSeiW7lQ2wRT9lZypNsAiX0gS0Ev1PrvNkGLmwL3B8ZmyOUMLYbTYbSW0y_aPGrIFlEp4skDzh0GJGIGtFHisCmQjEMlu0CJr0UEw2rCT9jbjzg0IyOnFYxNffaMPo229NZWV7HDfCK5M1_y6MPNvW_Kt-h4qTy8YmqGbfBwKxB-bulV78MSXU9ZWz_DVvdu6jXfPiHwCBDMV8FFBLaXm5rqYgNzvbsq8JLe1xkTPn1PNJhyizUa-hlwB6ev8HNwIwBpzs7406l6mL3VgyrDJpay6bHNoMtjh4fLwI7KapFANhFHfn57mg4
#HttpOnly_www.xxx.com   FALSE   /   TRUE    0   ASP.NET_SessionId   txakhdi15oeqxyfq53f44dts

When I manually log into the web site the cookie names are correct. So I think I am creating the login ( otherwise the cookies would not be created) but when I output

echo 'HELLO html1 = '.$html1;

I see the page telling me I have entered the wrong username and password.

Code as follows:

ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
// echo 'STARTING';



//login form action url
$url="https://www.xxxx.com/Login"; 
$postinfo = "username=".$username."&password=".$password;

$cookie_file_path = "cookie.txt";

$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);

curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS,5); // return into a variable
// curl_setopt($ch, CURLOPT_UPLOAD, true); 
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST" );
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);

// set content length
$headers[] = 'Content-length: 0';
$headers[] = 'Transfer-Encoding: chunked';
curl_setopt($ch, CURLOPT_HTTPHEADER , $headers);

$html1 = curl_exec($ch);
echo 'HELLO html1 = '.$html1;

I cannot show the site for security reasons. ( which may be a killer)

Can anyone point me in the right direction?

Upvotes: 0

Views: 368

Answers (3)

hanshenrik
hanshenrik

Reputation: 21465

first off, this won't work: ini_set('display_startup_errors', 1); - the startup phase is already finished before the userland php code starts to run, so this setting is set too late. it must be set in the php.ini config file. (not strictly true, but close enough, like on windows you can do crazy registry hacks to enable it, and you can set it with .user.ini files, etc, more info here http://php.net/manual/en/configuration.php )

second, obvious error here is that you don't urlencode $username and $password in $postinfo = "username=".$username."&password=".$password; - if the username OR password contains any characters with special meanings in urlencoded format, you'll send the wrong credentials and won't get logged in (this includes &,=,@, spaces, and many other characters). fixed version would look like $postinfo = "username=".urlencode($username)."&password=".urlencode($password);

third, don't use CURLOPT_CUSTOMREQUEST for POST requests, just use CURLOPT_POST.

fourth, your Content-length header is outright lying. the correct length is actually 'Content-length: '.strlen($postinfo) - which with your code, is definitely not 0 - but you shouldn't set this header at all, curl will do it for you if you don't, and unlike you, curl won't mess up the code calculating the size, so get rid of the entire line.

fifth, this code is also wrong: $headers[] = 'Transfer-Encoding: chunked'; your curl code here is NOT using chuncked transfers, and if it were, curl would send that header automatically, so get rid of it.

sixth, don't just call curl_setopt, if there's an error setting any of your options, curl_setopt will return bool(false), and you should watch out for such errors, use curl_error to extract the error message, and throw an exception, if such an error occur. - instead of what your code is doing right now, silently ignoring any curl_setopt errors. use something like function ecurl_setopt($ch,int $option, $value){if(!curl_setopt($ch,$option,$value)){throw new \RuntimeException('curl_setopt failed!: '.curl_error($ch));}}

if fixing all of these problems is not enough to log in, you're not giving us enough information to help you any further. what does the browsers http login request look like? or what is the login url?

Upvotes: 1

gene
gene

Reputation: 146

It is not as simple as reading the HTML page using curl. You need to supply a POST value for the submit button. If there is any javascript that executes prior to the activation of ACTION script, then that has to be looked at as well.

Usually you get better results if you use Selenium. See http://www.seleniumhq.org/

EDIT1:

If the server is rejecting your post string try: curl_setopt($handle, CURLOPT_POSTFIELDS, http_build_query($data));

Upvotes: 0

ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';    
//login form action url
$url="https://www.xxxx.com/Login"; 
$postinfo = array("username"=>$username,"password"=>$password);
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
$html = curl_exec($ch);
echo $html;

Above code must works fine. If there is still an issue, you must check cookie.txt file permissions.

Also if there is an invisible data needs to be sent including post, you can check it using firefox Live Http Headers plugin.

Upvotes: 0

Related Questions