Reputation: 171
I want to download the page from a URL, easy enough. But on the first page I have to login, as I normally do from a normal browser. But HTTrack is downloading from the first page since it can't use my cookies or login.
Is it any way for me to get around this?
Upvotes: 13
Views: 20113
Reputation: 711
This question was asked in 2013 so I'm not sure if Httrack was supporting cookies back then but it definitely does now.
Instructions:
cookies.txt
or if it's not there just create one and open it.More info:
If you don't know how to look at your cookies, it's relatively simple...
You have to open the Dev Tools (F12) and navigate to the Cookies section:
For Firefox: F12 -> Storage -> Cookies
For Chrome: F12 -> Application -> Storage -> Cookies
If you are still having problems with Httrack even after you did everything correctly, you can try to copy your browser's User-Agent to your Httrack configuration. By default Httrack is using its own User-agent, some websites might not like it and reject these connections.
Example of a cookie.txt for Httrack:
www.httrack.com TRUE / FALSE 1999999999 foo bar
www.example.com TRUE /folder FALSE 1999999999 JSESSIONID xxx1234
www.example.com TRUE /hello FALSE 1999999999 JSESSIONID yyy1234
IMPORTANT: Don't copy/paste this example of cookie.txt, StackOverflow is automatically converting TABS into SPACES and the cookie.txt just doesn't work when using spaces... There is nothing I can do to fix this example so only use it as a visual reference. Thanks to tugelblend for pointing this out in the comments.
Reference: http://httrack.kauler.com/help/Cookies
Upvotes: 22
Reputation: 9121
Adding to Frank Einstein's answer:
You might not need cookies.txt
, as httrack also has --headers
option. So, first copy the relevant session cookie from the brwoser, and then you can use:
httrack --headers 'Cookie: SESSIONID=1234...' ...
Upvotes: 1
Reputation: 4136
Try using cURL in PHP:
http://php.net/manual/en/book.curl.php
There are wrappers for this, like:
http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading
Use options such as:
Download the class from:
http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading
require_once( 'CURL.php' ); //Change this to whatever that class is called in the above
$curl = new CURL();
$curl->retry = 2;
$opts = array(
CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20091020 Linux Mint/8 (Helena) Firefox/3.5.3',
CURLOPT_COOKIEFILE => 'fb.tmp',
CURLOPT_COOKIEJAR => 'fb.tmp',
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_SSL_VERIFYPEER => 0,
CURLOPT_TIMEOUT => 20
);
$post_data = array( ); //put your login POST data here
$opts[CURLOPT_POSTFIELDS] = http_build_query( $post_data );
$curl->addSession( 'https://www.facebook.com/messages', $opts );
$result = $curl->exec();
$curl->clear();
print_r( $result );
Note, that sometimes you need to load a page first, to set a cookie, before they will let you login.
Upvotes: -3