Reputation: 449
I am trying to recursively download .nc files from: https://satdat.ngdc.noaa.gov/sem/goes/data/full/*/*/*/netcdf/*.nc
A target link looks like this one:
https://satdat.ngdc.noaa.gov/sem/goes/data/full/1992/11/goes07/netcdf/
and I need to exclude this:
https://satdat.ngdc.noaa.gov/sem/goes/data/full/1992/11/goes07/csv/
I do not understand how to use wildcards for defining path in wget
.
Also, the following command (a test for year 1981 only), only downloads subfolders 10, 11 and 12, failing with {01..09} subfolders:
for i in {01..12};do wget -r -nH -np -x --force-directories -e robots=off https://satdat.ngdc.noaa.gov/sem/goes/data/full/1981/${i}/goes02/netcdf/; done
Upvotes: 1
Views: 484
Reputation: 36360
I do not understand how to use wildcards for defining path in
wget
.
According to GNU Wget manual
File name wildcard matching and recursive mirroring of directories are available when retrieving via FTP.
so you must not use one in URL provided when working with HTTP or HTTPS server.
You might combine -r
with --accept-regex urlregex
to
Specify a regular expression to accept(...)the complete URL.
Observe that it should match whole URL, for example if I wish pages linked in GNU Package blurbs which contain auto
in path I could do that by
wget -r --level=1 --accept-regex '.*auto.*' https://www.gnu.org/manual/blurbs.html
which result in download main pages of autoconf, autoconf-archive, autogen, automake. Note: --level=1
is used to prevent going further down than links shown in blurbs.
Upvotes: 3