Raghav
Raghav

Reputation: 2238

wget not working

On Ubuntu, I am trying to download a file (from a script) using wget. Buildling a program to download this file everyday and load to a hadoop cluster.

however, the wget fails, with the following message.

wget http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip
--2012-06-16 03:37:30--  http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip
Resolving www.nseindia.com... 122.178.225.48, 122.178.225.18
Connecting to www.nseindia.com|122.178.225.48|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-06-16 03:37:30 ERROR 403: Forbidden.

when I try the same url in firefox or equivalent, it works just fine. And yes, there is no license agreement kind of thing involved...

Am I missing something basic regarding wget ??

Upvotes: 13

Views: 60836

Answers (5)

user3828272
user3828272

Reputation:

Some sites simply prevent wget user-agent to download files wget -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' http://yourURL.com

Upvotes: 0

Anas
Anas

Reputation: 29

I use curl -O <URL> because wget don't support HTTPS and some other protocols.

Upvotes: 1

David Vezzani
David Vezzani

Reputation: 1469

Another technique webapps or webservers may use is to check the 'Referrer' content header value. In addition to specifying the user agent, it may be necessary to supply the referrer url.

e.g.,

wget --referer http://freestockphotos.com/Scenery1.html http://freestockphotos.com/SKY/TreeSunset.jpg

This host appears to reject requests for the target file if they were not made while navigating from the 'Scenery1.html' page.

Upvotes: 1

enderskill
enderskill

Reputation: 7674

The site blocks wget because wget uses an uncommon user-agent by default. To use a different user-agent in wget, try:

wget -U Mozilla/5.0 http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip

Upvotes: 15

Zagorax
Zagorax

Reputation: 11890

Use:

wget -U mozilla http://www.nseindia.com/content/historical/EQUITIES/2012/JUN/cm15JUN2012bhav.csv.zip

Some sites simply prevent wget user-agent to download files. I just downloaded that file with this command. It works.

Upvotes: 6

Related Questions