Reputation: 118
I'm apparently too oblivious to (w)get all teh slides.
Having a pdf & ppt filled site: http://some.uni.edu/~name/slides.html I'd like to download all the (many) linked files in one go. So far the dir gets made by wget, but it's empty.
I tried:
wget -r -A.pdf,.ppt http://some.uni.edu/~name/slides.html
wget -e robots=off -A.pdf,.ppt -r -l1 http://some.uni.edu/~name/slides.html
wget -nd -l -r -e robots=off http://some.uni.edu/~name/slides.html
wget -r -np -R "slides.html" http://some.uni.edu/~name/slides.html
wget -r -np -R "slides.html" http://some.uni.edu/~name/
So for example:
$ wget -r https://web.cs.ucla.edu/~kaoru/
--2018-10-29 21:38:50-- https://web.cs.ucla.edu/~kaoru/
Resolving web.cs.ucla.edu (web.cs.ucla.edu)... 131.179.128.29
Connecting to web.cs.ucla.edu (web.cs.ucla.edu)|131.179.128.29|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 623 [text/html]
Saving to: ‘web.cs.ucla.edu/~kaoru/index.html’
web.cs.ucla.edu/~ka 100%[===================>] 623 --.-KB/s in 0s
2018-10-29 21:38:51 (19.1 MB/s) - ‘web.cs.ucla.edu/~kaoru/index.html’ saved [623/623]
Loading robots.txt; please ignore errors.
--2018-10-29 21:38:51-- https://web.cs.ucla.edu/robots.txt
Reusing existing connection to web.cs.ucla.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 95 [text/plain]
Saving to: ‘web.cs.ucla.edu/robots.txt’
web.cs.ucla.edu/rob 100%[===================>] 95 --.-KB/s in 0s
2018-10-29 21:38:51 (3.10 MB/s) - ‘web.cs.ucla.edu/robots.txt’ saved [95/95]
--2018-10-29 21:38:51-- https://web.cs.ucla.edu/~kaoru/paper11.gif
Reusing existing connection to web.cs.ucla.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 10230 (10.0K) [image/gif]
Saving to: ‘web.cs.ucla.edu/~kaoru/paper11.gif’
web.cs.ucla.edu/~ka 100%[===================>] 9.99K --.-KB/s in 0.001s
2018-10-29 21:38:51 (12.3 MB/s) - ‘web.cs.ucla.edu/~kaoru/paper11.gif’ saved [10230/10230]
FINISHED --2018-10-29 21:38:51--
Total wall clock time: 0.9s
Downloaded: 3 files, 11K in 0.001s (12.2 MB/s)
Still downloads no files:
$ ls
$ index.html paper11.gif
Upvotes: 0
Views: 327
Reputation: 317
wget -h |grep np,
-np, --no-parent don't ascend to the parent directory
wget -h |grep A,
-A, --accept=LIST comma-separated list of accepted extensions
wget -h |grep r,
-r, --recursive specify recursive download
Try to use
wget -r -np -A pdf,doc https://web.cs.ucla.edu/~harryxu/
Result
tree
└── web.cs.ucla.edu
├── ~harryxu
│ ├── papers
│ │ ├── chianina-pldi21.pdf
│ │ ├── dorylus-osdi21.pdf
│ │ ├── genc-pldi20.pdf
│ │ ├── jaaru-asplos21.pdf
│ │ ├── jportal-pldi21.pdf
│ │ ├── li-sigcomm20.pdf
│ │ ├── trimananda-fse20.pdf
│ │ ├── vigilia-sec18.pdf
│ │ ├── vora-asplos17.pdf
│ │ ├── wang-asplos17.pdf
│ │ ├── wang-osdi18.pdf
│ │ ├── wang-osdi20.pdf
│ │ ├── wang-pldi19.pdf
│ │ └── zuo-eurosys19.pdf
Upvotes: 0
Reputation: 909
Your examples
wget -r -A.pdf,.ppt http://some.uni.edu/~name/slides.html
wget -e robots=off -A.pdf,.ppt -r -l1 http://some.uni.edu/~name/slides.html
wget -nd -l -r -e robots=off http://some.uni.edu/~name/slides.html
wget -r -np -R "slides.html" http://some.uni.edu/~name/slides.html
should not work the way you want, since you are specifically targeting a single html
-file, namely slides.html
. You should be targeting the directory.
However, your last example is the closest I think.
Since @Kingsley 's example works for you, you should try this first, and then starting to -R
and -A
files.
wget -r http://some.uni.edu/~name/
Maybe it should be https
!?
Anyway, if "directory listing" is not permitted (controlled by server), then wget
can not get all files recursively. It can only get specific files which you know the names of!
Upvotes: 1