XDGFX
XDGFX

Reputation: 41

wget Fails to Download Website (ERROR 0: no description)

I'm trying to mirror the whole website at http://opposedforces.com/parts/impreza/en_g11/type_63/

Accessing through a browser (Firefox, w3m) or Postman work fine, and return the html file. Accessing through wget, cURL, the Python requests module and HTTrack all fail.

wget specifically fails with:

↪ wget --mirror -p --convert-links  "http://opposedforces.com/parts/impreza/en_g11/type_63/"
--2021-02-03 20:48:29--  http://opposedforces.com/parts/impreza/en_g11/type_63/
Resolving opposedforces.com (opposedforces.com)... 138.201.30.59Connecting to opposedforces.com (opposedforces.com)|138.201.30.59|:80... connected.
HTTP request sent, awaiting response...  0
2021-02-03 20:48:29 ERROR 0: (no description).

Converted links in 0 files in 0 seconds.

It seemingly returns no information. Originally I thought some JavaScript was generating the html, but I can't find any JS using Firefox developer tools, and I would assume Postman would not work in this case.

Any ideas how to get around this? Ideally I can use wget to download this and all sub-pages, but alternative solutions are also welcome.

Upvotes: 1

Views: 722

Answers (1)

darnir
darnir

Reputation: 5190

This is one of those times when the website is completely and absolutely broken. It is unfortunate that web browsers go to great lengths to support such broken web pages.

The problem is that the server sends a broken response. This is the response I see:

---response begin---
HTTP/1.1 000 
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 44892
Expires: -1
Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
Set-Cookie: ASP.NET_SessionId=gxhoir45jpd43545iujdpiru; path=/; HttpOnly
X-Powered-By: ASP.NET
Date: Fri, 05 Feb 2021 09:26:26 GMT

See? It returns a HTTP/1.1 000 response, which doesn't exist in the spec. Web browsers seem to just accept it as a 200 response and move on. Wget doesn't.

But you can get around it by using the --content-on-error option which is ask Wget to download the content irrespective of the response code

Upvotes: 1

Related Questions