Reputation: 12814
Using wget
version 1.20.3
or higher...
I am currently using a command like so, to keep a static "backup" of my blog, but only pages under "/blog":
wget --mirror --convert-links --adjust-extension --page-requisites --wait=1 https://example.com/blog
However, some of my blog pages reference static downloadable files (usually a PDF; so extensions are known and predictable) that are stored in a top-level "static" directory, e.g.,
https://example.com/static-files/file1.pdf
or https://example.com/static-files/file2.png
I would like the behavior of --no-parent
, where only pages under /blog
are downloaded, but I would also specifically like to archive all files that are linked to, that exist in the static-files
directory.
If not, is there a reasonable compromise?
Upvotes: 1
Views: 834
Reputation: 12814
I think this will do what I need (it definitely downloaded the static files). Tho it seemed to download slightly different sets of files, so I'm not 100% sure what might be different:
wget --mirror --convert-links --adjust-extension --page-requisites --wait=1 --include-directories="/s,/blog" https://example.com/blog/
The output directories I have now are blog
and static-files
. blog
contains more than it did... but I haven't looked at why.
The main difference here is that instead of excluding parent directories, we're only including the directories we want content from.
I welcome anyone to expound on the differences, and explain why this may or may not be the correct answer.
Upvotes: 2