Reputation: 2238
On Ubuntu, I am trying to download a file (from a script) using wget. Buildling a program to download this file everyday and load to a hadoop cluster.
however, the wget fails, with the following message.
wget http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip
--2012-06-16 03:37:30-- http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip
Resolving www.nseindia.com... 122.178.225.48, 122.178.225.18
Connecting to www.nseindia.com|122.178.225.48|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-06-16 03:37:30 ERROR 403: Forbidden.
when I try the same url in firefox or equivalent, it works just fine. And yes, there is no license agreement kind of thing involved...
Am I missing something basic regarding wget ??
Upvotes: 13
Views: 60836
Reputation:
Some sites simply prevent wget user-agent to download files wget -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' http://yourURL.com
Upvotes: 0
Reputation: 29
I use curl -O <URL>
because wget don't support HTTPS and some other protocols.
Upvotes: 1
Reputation: 1469
Another technique webapps or webservers may use is to check the 'Referrer' content header value. In addition to specifying the user agent, it may be necessary to supply the referrer url.
e.g.,
wget --referer http://freestockphotos.com/Scenery1.html http://freestockphotos.com/SKY/TreeSunset.jpg
This host appears to reject requests for the target file if they were not made while navigating from the 'Scenery1.html' page.
Upvotes: 1
Reputation: 7674
The site blocks wget because wget uses an uncommon user-agent by default. To use a different user-agent in wget, try:
wget -U Mozilla/5.0 http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip
Upvotes: 15
Reputation: 11890
Use:
wget -U mozilla http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip
Some sites simply prevent wget user-agent to download files. I just downloaded that file with this command. It works.
Upvotes: 6