veezy_101
veezy_101

Reputation: 101

How to correct my wGet scripts download images from the Lunar Orbiter and Lunar Sample open directories

I've tried countless combinations of options in wGet scripts with no success at getting what I actually want. I tried countless options taken from answered questions here as well.

I need help using wGet to download all images from a couple of sites. Although I only need example or directions for one site and I'll take it from there. So, I've been trying to download all images from the Lunar Orbiter open directory. The files I need are all .png images stored in locations such as:

https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5005/
https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5006/
https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5007/
https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5008/
https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5009/
https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5010/

Each one of these folders has anywhere from like 4-10 images and there are over 216 folders total.

The first folder starting with ..../FRAME_5005/
And the last folder ending with ..../FRAME_5217/

I thought this would be easy but as it turns out, every image is actually just hyperlinked to another domain where the actual link there is the image source download link.

https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5005/FRAME_5005_H1.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5005/FRAME_5005_H2.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5005/FRAME_5005_H3.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5005/FRAME_5005_M.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5006/FRAME_5006_H1.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5006/FRAME_5006_H2.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5006/FRAME_5006_H3.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5006/FRAME_5006_M.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5007/FRAME_5007_H1.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5007/FRAME_5007_H2.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5007/FRAME_5007_H3.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5007/FRAME_5007_M.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5008/FRAME_5008_H1.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5008/FRAME_5008_H2.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5008/FRAME_5008_H3.PNG
https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5008/FRAME_5008_M.PNG

So as you can see the directory folder structure is actually very similar to the original/starting domains folder structure.

The open directory base domain being:

https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/

The actual image hosting domain being:

https://dmjh6ggf608es.cloudfront.net/

All sub folders beyond that are the same as shown above.

All I want is to download every single image from the entire site. I've tried so many different ways and grouping of options with wget and I still cannot download just all the images from the site without either just getting the site structure, or errors, or a clone of site with no images. Just to be clear, I want every image on the site that is under the level of "browse".

For example, everything below this:

https://pdsimage2.wr.usgs.gov/Lunar_Orbiter/LO_1001/EXTRAS/BROWSE/

which is really this:

https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/

So that way I get all the images from directories "FRAME_5001" to "FRAME_5217" Because one step above the "FRAME_5008" directory is many more directorys like "LO5". It actually starts in "LO1" and so forth. So each "LOxx" directory has like 10 sub directories with each having 10-20 photos.

Made this screenshot for reference IF needed

enter image description here

This is just a sample of the different scripts & option configurations I tried:

wget -nd -r -H -A png https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/FRAME_5005/
wget -nd -r -H -A png https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/LO5/
wget -nd -r -H -A png https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/

and

wget -rH -A png -D https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE
wget -rH -A png -D https://dmjh6ggf608es.cloudfront.net/LO_1001/EXTRAS/BROWSE/

and

wget -r -A png,jpeg -m -p -k -l 8 https://dmjh6ggf608es.cloudfront.net/LO_1003/EXTRAS/BROWSE/
wget -r --tries=inf -A png,jpeg -m -p -k -l 8 --directory-prefix="L:\_Main\Astro\Lunar.Sample.Photos\A17VIS_0001\EXTRAS\FULL_RES_JPEG\BASALT\ILMENITE\test2" https://pdsimage2.wr.usgs.gov/Apollo/Lunar_Sample_Photographs/A17VIS_0001/EXTRAS/FULL_RES_JPEG

There were many more different options and combos but these were the only ones left in my clipboard manager utility and I had deleted the files after getting frustrated.

Upvotes: 0

Views: 25

Answers (0)

Related Questions