Reputation: 2480
I'm trying to understand how to use wget to download specific directories from a bunch of different ftp sites with economic data from the US government.
As a simple example, I know that I can download an entire directory using a command like:
wget --timestamping --recursive --no-parent ftp://ftp.bls.gov/pub/special.requests/cew/2013/county/
But I envision running more complex downloads, where I might want to limit a download to a handful of directories. So I've been looking at the --include option. But I don't really understand how it works. Specifically, why doesn't this work:
wget --timestamping --recursive -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
The following does work, in the sense that it downloads files, but it downloads way more than I need (everything in the 2013 directory, vs just the county subdirectory):
wget --timestamping --recursive -I /pub/special.requests/cew/2013/ ftp://ftp.bls.gov/pub/special.requests/cew/
I can't tell if i'm not understanding something about wget or if my issue is with something more fundamental to ftp server structures.
Thanks for the help!
Upvotes: 6
Views: 20222
Reputation: 124804
Based on this doc it seems that the filtering functions of wget
are very limited.
When using the --recursive
option, wget
will download all linked documents after applying the various filters, such as --no-parent
and -I
, -X
, -A
, -R
options.
In your example:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
This won't download anything, because the -I
option specifies to include only links matching /pub/special.requests/cew/2013/county/
, but on the page /pub/special.requests/cew/
there are no such links, so the download stops there. This will work though:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/
... because in this case the /pub/special.requests/cew/2013/
page does have a link to county/
Btw, you can find more details in this doc than on the man
page:
http://www.gnu.org/software/wget/manual/html_node/
Upvotes: 3
Reputation: 229294
can't you simply do (and add the --timestamping/--no-parent etc. as needed)
wget -r ftp://ftp.bls.gov/pub/special.requests/cew/2013/county
The -I seems to work at one directory level at a time, so if we step one step up from county/ we could do:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/
But apparently we can't step further up and do
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
Upvotes: 2