Reputation: 363
I want to write a crawler to extract information from a corporate website that is accessible only from the company intranet. I can access that website from any browser installed on my laptop, but I get an HTTP401 when I try to do this from any other web client (curl, nodejs request, ...).
I've tried to play with different settings related to proxy and basic auth but I could not find any solution that works.
As I'm on a Windows system and I think IE network settings may be involved here, I've also tried to get the network proxy settings from IE with
netsh winhttp import proxy source =ie
, but it did not make any difference.
What have I missed? How can I determine what make local browsers able to reach that website and not other web clients? I've had a look at the request in Chrome developer tools but was not able to find anything that could help there.
Upvotes: 0
Views: 44
Reputation: 363
I finally found the solution. I was missing the --ntlm
flag in curl
.
$ curl -s -o /dev/null -w "%{http_code}" -u "${USER}:${PASS}" ${URL}
401
$ curl --ntlm -s -o /dev/null -w "%{http_code}" -u "${USER}:${PASS}" ${URL}
200
Upvotes: 0