Mason
Mason

Reputation: 7103

Wget and cURL aren't working with Wikipedia

I'm trying to download the source of a particular wikipedia article to my computer. However, the wget and curl tools aren't working! I'm not sure why. Whenever I type something like wget http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters or curl http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters, I get gibberish (the same with both curl and wget).

First line of the output I get: ??N?????g???????^??L??~???IR?OX/?џ??X???4????b???m??Jk??o߾5E_S???D?xT????y???>??b?C?g?B?#?}????ŏ?Hv?K?dڛ?L˿l?K??,???T?c????n?????F*???'???w??z??d񧿜??? ???Y1Id?z?:7C?'W2??(?%>?~ԫ?|~7??4?%qz?r???H?]??P?PH 77I??Z6~{z??UG?~???]?.?#?G?F\????ӓ???8??ߞ?

Any ideas on why this might be happening?

Upvotes: 0

Views: 1700

Answers (3)

Gangadhar
Gangadhar

Reputation: 1903

The reason you are getting gzip-ed data is because by default Wiki data is sent in gzipped format. If you chekc the headers of the response (you can do this in a tool like Fiddler)

HTTP/1.0 200 OK
Date: Tue, 08 May 2012 03:45:40 GMT
Server: Apache
X-Content-Type-Options: nosniff
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en
Vary: Accept-Encoding,Cookie
Last-Modified: Tue, 08 May 2012 02:33:41 GMT
Content-Length: 83464
Content-Type: text/html; charset=UTF-8
Age: 6415
X-Cache: HIT from cp1008.eqiad.wmnet
X-Cache-Lookup: HIT from cp1008.eqiad.wmnet:3128
X-Cache: MISS from cp1018.eqiad.wmnet
X-Cache-Lookup: MISS from cp1018.eqiad.wmnet:80
Connection: close
Content-Encoding: gzip

The last line in the header is the clue to what you are seeing. So, you can stream the output from wiki and pipe it to gunzip to get the required response.

Upvotes: 2

Pavel Strakhov
Pavel Strakhov

Reputation: 40502

I guess there is the problem with your terminal. Try this:

wget -q -O - http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters

Upvotes: 1

Zheng Kai
Zheng Kai

Reputation: 3635

curl --compressed http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters

wget: http://www.commandlinefu.com/commands/view/7180/get-gzip-compressed-web-page-using-wget.

Upvotes: 3

Related Questions