Al R.
Al R.

Reputation: 2480

Using wget to download select directories from ftp server

I'm trying to understand how to use wget to download specific directories from a bunch of different ftp sites with economic data from the US government.

As a simple example, I know that I can download an entire directory using a command like:

wget  --timestamping  --recursive --no-parent ftp://ftp.bls.gov/pub/special.requests/cew/2013/county/

But I envision running more complex downloads, where I might want to limit a download to a handful of directories. So I've been looking at the --include option. But I don't really understand how it works. Specifically, why doesn't this work:

wget --timestamping --recursive -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

The following does work, in the sense that it downloads files, but it downloads way more than I need (everything in the 2013 directory, vs just the county subdirectory):

wget --timestamping --recursive -I /pub/special.requests/cew/2013/ ftp://ftp.bls.gov/pub/special.requests/cew/

I can't tell if i'm not understanding something about wget or if my issue is with something more fundamental to ftp server structures.

Thanks for the help!

Upvotes: 6

Views: 20222

Answers (2)

janos
janos

Reputation: 124804

Based on this doc it seems that the filtering functions of wget are very limited.

When using the --recursive option, wget will download all linked documents after applying the various filters, such as --no-parent and -I, -X, -A, -R options.

In your example:

wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

This won't download anything, because the -I option specifies to include only links matching /pub/special.requests/cew/2013/county/, but on the page /pub/special.requests/cew/ there are no such links, so the download stops there. This will work though:

wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/

... because in this case the /pub/special.requests/cew/2013/ page does have a link to county/

Btw, you can find more details in this doc than on the man page:

http://www.gnu.org/software/wget/manual/html_node/

Upvotes: 3

nos
nos

Reputation: 229294

can't you simply do (and add the --timestamping/--no-parent etc. as needed)

 wget -r ftp://ftp.bls.gov/pub/special.requests/cew/2013/county

The -I seems to work at one directory level at a time, so if we step one step up from county/ we could do:

 wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/

But apparently we can't step further up and do

 wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

Upvotes: 2

Related Questions