pedalpete
pedalpete

Reputation: 21556

wget to download images from retrieved file

I've got a file I need to retrieve, then I need to go through that file and download all the images listed. The format is xml, but I don't want to use an xml parser.

When I use

sudo wget --restrict-file-names=windows -nH -nd -r -i -P images \ -A jpeg,jpg,gif,png https://url.com/api/ojgnvhy75hGvcf36dnJO0947bsh62gbs?_=1361842359357 

I get the xml file downloaded, but I need the images which are referenced in that file.

What am I doing wrong here?

Upvotes: 0

Views: 511

Answers (1)

pedalpete
pedalpete

Reputation: 21556

I ended up with the following code, get the xml file and save it to text, then I get the links form the text file using sed and write those into another file, then use wget on that file to download the images.

#!/bin/dash

wget -O xml.txt 'https://url_to_download_from' 
links=$(sed -n "/image>/s/^   .\([^>]*\)<\/image>.*/\1/gpw links.txt" xml.txt)
wget -N  -P images -A png -i $links 

Sadly, this results a bunch of files which are not images, even though I'm requesting only images.

After this script has completed, I run the following commands to clean up the folder.

cd images
shopt -s extglob nocaseglob
rm !(*.png)

Upvotes: 0

Related Questions