Reputation: 3557
I have a web server that saves the logs files of a web application numbered. A file name example for this would be:
dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log
The last 3 digits are the counter and they can get sometime up to 100.
I usually open a web browser, browse to the file like:
http://someaddress.com/logs/dbsclog01s001.log
and save the files. This of course gets a bit annoying when you get 50 logs. I tried to come up with a BASH script for using wget and passing
http://someaddress.com/logs/dbsclog01s*.log
but I am having problems with my the script. Anyway, anyone has a sample on how to do this?
thanks!
Upvotes: 47
Views: 79412
Reputation: 204974
#!/bin/sh
if [ $# -lt 3 ]; then
echo "Usage: $0 url_format seq_start seq_end [wget_args]"
exit
fi
url_format=$1
seq_start=$2
seq_end=$3
shift 3
printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@"
Save the above as seq_wget
, give it execution permission (chmod +x seq_wget
), and then run, for example:
$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50
Or, if you have Bash 4.0, you could just type
$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log
Or, if you have curl
instead of wget
, you could follow Dennis Williamson's answer.
Upvotes: 67
Reputation: 7
Late to the party, but a real easy solution that requires no coding is to use the DownThemAll Firefox add-on, which has the functionality to retrieve ranges of files. That was my solution when I needed to download 800 consecutively numbered files.
Upvotes: -1
Reputation: 591
You can use echo type sequences in the wget url to download a string of numbers...
wget http://someaddress.com/logs/dbsclog01s00{1..3}.log
This also works with letters
{a..z} {A..Z}
Upvotes: 23
Reputation: 101
Oh! this is a similar problem I ran into when learning bash to automate manga downloads.
Something like this should work:
for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
b="00"
elif [ ${#a} -eq 2 ]; then
b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg
done
Upvotes: 0
Reputation: 3768
Not sure precisely what problems you were experiencing, but it sounds like a simple for loop in bash would do it for you.
for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done
Upvotes: 14
Reputation: 40773
Check to see if your system has seq, then it would be easy:
for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done
If your system has the jot command instead of seq:
for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done
Upvotes: 0
Reputation: 1118
Interesting task, so I wrote full script for you (combined several answers and more). Here it is:
#!/bin/bash
# fixed vars
URL=http://domain.com/logs/ # URL address 'till logfile name
PREF=logprefix # logfile prefix (before number)
POSTF=.log # logfile suffix (after number)
DIGITS=3 # how many digits logfile's number have
DLDIR=~/Downloads # download directory
TOUT=5 # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
file=$PREF`printf "%0${DIGITS}d" $i`$POSTF # local file name
dl=$URL$file # full URL to download
echo "$dl -> $DLDIR/$file" # monitoring, can be commented
wget -T $TOUT -q $dl -O $file
if [ "$?" -ne 0 ] # test if we finished
then
exit
fi
done
At the beggiing of the script you can set URL, log file prefix and suffix, how many digits you have in numbering part and download directory. Loop will download all logfiles it found, and automaticaly exit on first non-existant (using wget's timeout).
Note that this script assumes that logfile indexing starts with 1, not zero, as you mentioned in example.
Hope this helps.
Upvotes: 3
Reputation: 360625
curl
seems to support ranges. From the man
page:
URL The URL syntax is protocol dependent. You’ll find a detailed descrip‐ tion in RFC 3986. You can specify multiple URLs or parts of URLs by writing part sets within braces as in: http://site.{one,two,three}.com or you can get sequences of alphanumeric series by using [] as in: ftp://ftp.numericals.com/file[1-100].txt ftp://ftp.numericals.com/file[001-100].txt (with leading zeros) ftp://ftp.letters.com/file[a-z].txt No nesting of the sequences is supported at the moment, but you can use several ones next to each other: http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order. Since curl 7.15.1 you can also specify step counter for the ranges, so that you can get every Nth number or letter: http://www.numericals.com/file[1-100:10].txt http://www.letters.com/file[a-z:2].txt
You may have noticed that it says "with leading zeros"!
Upvotes: 43
Reputation: 15128
I just had a look at the wget manpage discussion of 'globbing':
By default, globbing will be turned on if the URL contains a globbing character. This option may be used to turn globbing on or off permanently. You may have to quote the URL to protect it from being expanded by your shell. Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix "ls" output).
So wget http://... won't work with globbing.
Upvotes: 0
Reputation: 258418
You can use a combination of a for loop in bash with the printf command (of course modifying echo
to wget
as needed):
$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html
Upvotes: 12
Reputation: 4444
Here you can find a Perl script that looks like what you want
http://osix.net/modules/article/?id=677
#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;
for($count=1;$count<=$max;$count++) {
if($count<10) {
$url=$base_url."0".$count.$format; #insert a '0' and form the URL
}
else {
$url=$base_url.$count.$format; #no need to insert a zero
}
system("$program $url");
}
Upvotes: 0