Sewatech
Sewatech

Reputation: 13

wget mirror returning 404 error despite website being accessible in browser

I'm trying to mirror a website using wget but getting a 404 error, even though the site is accessible through a browser.

Command used:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --execute robots=off --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" -P D:\client-sites\maharjanmetal.com.np https://maharjanmetal.com.np/products

Error output:

--2024-10-23 21:12:43-- https://maharjanmetal.com.np/products Resolving maharjanmetal.com.np (maharjanmetal.com.np)... 149.100.146.116 Connecting to maharjanmetal.com.np (maharjanmetal.com.np)|149.100.146.116|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://maharjanmetal.com.np/products/ [following] --2024-10-23 21:12:43-- https://maharjanmetal.com.np/products/ Reusing existing connection to maharjanmetal.com.np:443. HTTP request sent, awaiting response... 404 Not Found 2024-10-23 21:12:44 ERROR 404: Not Found.

What could be causing this 404 error when the site is clearly accessible through a browser? How can I successfully mirror this website using wget? Environment:

Windows 11

Expected behavior:

wget should download the website content and its assets, creating a local mirror of the site

Actual behavior:

Receiving a 404 error despite the site being accessible through browsers The command follows a 301 redirect from /products to /products/ but then fails with 404 No files are downloaded

The puzzling part is that the URL is perfectly accessible through browsers but wget consistently gets a 404 error after following the 301 redirect.

Upvotes: 0

Views: 33

Answers (1)

x1337Loser
x1337Loser

Reputation: 635

I checked your target URL and confirmed it is a 404 Not found page so obviously wget will stop if the response is 404,

If you still want to download this page then use the --content-on-error flag to ignore the 404 Not found error

example:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --execute robots=off --content-on-error --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" -P D:\client-sites\maharjanmetal.com.np https://maharjanmetal.com.np/products/

Upvotes: 0

Related Questions