Maxim Andrewson
Maxim Andrewson

Reputation: 57

WGET saves with wrong file and extension name possibly due to BASH

I`ve tried this on a few forum threads already. However I keep on getting the some failure as a result.

To replicate the problem :

Here is an url leading to a forum thread with 6 pages.

http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/1/vc/1

What I typed into the console was :

wget "http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/{1..6}/vc/1"

And here is what I got:

      --2018-06-14 10:44:17--  http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/%7B1..6%7D/vc/1
    Resolving forex.kbpauk.ru (forex.kbpauk.ru)... 185.68.152.1
    Connecting to forex.kbpauk.ru (forex.kbpauk.ru)|185.68.152.1|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Saving to: '1'

1                                    [  <=>                                       ]  19.50K  58.7KB/s    in 0.3s

2018-06-14 10:44:17 (58.7 KB/s) - '1' saved [19970]

The file was saved as simply "1" with no extension as it seems.

My expectations were that the file will be saved with an .html extension, because its a webpage.

Im trying to get WGET to work, but if its possible to do what I want with CURL than I would also accept that as an answer.

Upvotes: 2

Views: 3191

Answers (2)

darnir
darnir

Reputation: 5190

Well, there's a couple of issues with what you're trying to do.

  1. The double quotes around your URL actually prevent Bash expansion, so you're not really downloading 6 files, but a single URL with "{1..6}" in it. You probably want to not have quotes around the URL to allow bash to expand it into 6 different parameters.

  2. I notice that all of the pages are called "1", irrespective of their actual page numbers. This means the server is always serving a page with the same name, making it very hard for Wget or any other tool to actually make a copy of the webpage.

The real way to create a mirror of the forum would be to use this command line:

$ wget -m --no-parent -k --adjust-extension http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/1

Let me explain what this command does:

-m --mirror activates the mirror mode (recursion)
--no-parent asks Wget to not go above the directory it starts from
-k --convert-links will edit the HTML pages you download so that the links in them will point to the other local pages you have also downloaded. This allows you to browse the forum pages locally without needing to be online
--adjust-extension This is the option you were originally looking for. It will cause Wget to save the file with a .html extension if it downloads a text/html file but the server did not provide an extension.

Upvotes: 3

Max Carroll
Max Carroll

Reputation: 4849

simply use the -O switch to specify the output filename, otherwise wget just defaults to something like in your case its 1

so if you wanted to call your file what-i-want-to-call-it.html then you would do

 wget "http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/{1..6}/vc/1" -o what-i-want-to-call-it.html

if you type into the console wget --help you will get a full list of all the options that wget provides

To verify it has worked type the following to output

cat what-i-want-to-call-it.html

Upvotes: 1

Related Questions