anonymous coward
anonymous coward

Reputation: 12814

Is there a simple way to achieve the effect of `--no-parent` and also grab files from a specific top level directory, with wget?

Using wget version 1.20.3 or higher...

I am currently using a command like so, to keep a static "backup" of my blog, but only pages under "/blog":

wget --mirror --convert-links --adjust-extension --page-requisites --wait=1 https://example.com/blog

However, some of my blog pages reference static downloadable files (usually a PDF; so extensions are known and predictable) that are stored in a top-level "static" directory, e.g.,

https://example.com/static-files/file1.pdf or https://example.com/static-files/file2.png

I would like the behavior of --no-parent, where only pages under /blog are downloaded, but I would also specifically like to archive all files that are linked to, that exist in the static-files directory.

Is there a simple way to do that with a single wget command?

If not, is there a reasonable compromise?

Upvotes: 1

Views: 834

Answers (1)

anonymous coward
anonymous coward

Reputation: 12814

I think this will do what I need (it definitely downloaded the static files). Tho it seemed to download slightly different sets of files, so I'm not 100% sure what might be different:

wget --mirror --convert-links --adjust-extension --page-requisites --wait=1  --include-directories="/s,/blog" https://example.com/blog/

The output directories I have now are blog and static-files. blog contains more than it did... but I haven't looked at why.

The main difference here is that instead of excluding parent directories, we're only including the directories we want content from.

I welcome anyone to expound on the differences, and explain why this may or may not be the correct answer.

Upvotes: 2

Related Questions