Wandering Sophist
Wandering Sophist

Reputation: 143

How do I scrape full-sized images from a website?

I am trying to obtain clinical images of psoriasis patients from these two websites for research purposes:

http://www.dermis.net/dermisroot/en/31346/diagnose.htm

http://dermatlas.med.jhmi.edu/derm/

For the first site, I tried just saving the page with firefox, but it only saved the thumbnails and not the full-sized images. I was able to access the full-sized images using a firefox addon called "downloadthemall", but it saved each image as part of a new html page and I do not know of any way to extract just the images.

I also tried getting on one of my university's linux machines and using wget to mirror the websites, but I was not able to get it to work and am still unsure as to why.

Consequently, I am wondering whether it would be easy to write a short script (or whatever method is easiest) to (a) obtain the full-sized images linked to on the first website, and (b) obtain all full-sized images on the second site with "psoriasis" in the filename.

I have been programming for a couple of years, but have zero experience with web development and would appreciate any advice on how to go about doing this.

Upvotes: 1

Views: 5474

Answers (2)

danielbeard
danielbeard

Reputation: 9149

Why not use wget to recursively download images from the domain? Here is an example:

wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://www.domain.com

Here is the man page: http://www.gnu.org/software/wget/manual/wget.html

Upvotes: 2

aretai
aretai

Reputation: 1641

Try HTTrack website copier - it will load all the images on the website. You can also try http://htmlparser.sourceforge.net/. It will grab website as well with resources if you specify it in org.htmlparser.parserapplications.SiteCapturer

Upvotes: 1

Related Questions