Dannyd86
Dannyd86

Reputation: 87

downloading images from webpages using php

I am trying to create a PHP function that downloads images from a webpage that you put in as a parameter. However, the webpage itself though is a kind of gallery which only has very small thumbnail versions of the images, each linking directly to the larger full jpeg images that I want to download to my local computer. So the images will not downloaded directly from the webpage itself that I put into the function, but rather from the individual links to these jpeg image files on the webpage.

So for example:

www.somesite.com/galleryfullofimages/

is the location of the image gallery,

and each jpeg image file from the gallery that I want is then located at something like:

www.somesite.com/galleryfullofimages/images/01.jpg
www.somesite.com/galleryfullofimages/images/02.jpg
www.somesite.com/galleryfullofimages/images/03.jpg

What I've been trying to do so far is to use the file_get_contents function to get the full html of the webpage as a string, and then try to isolate all of the <a href="images/01.jpg"> elements inside the quotes and put them inside of an array. Then use this array to locate each image and download them all with a loop.

this is what I have done so far:

<?php

$link = "http://www.somesite.com/galleryfullofimages/";
$contents = file_get_contents($link);

$results = preg_split('/<a href="[^"]*"/', $contents);

?>

But I am stuck at this point. I am also totally new to regular expressions, which as you can see I tried to use. How can I isolate each image link and then download the image? Or is there a better way of doing this altogether? I have also read about using cURL. But I can't seem to implement that either.

I hope this all makes sense. Any help will be greatly appreciated.

Upvotes: 1

Views: 1510

Answers (1)

jimp
jimp

Reputation: 17487

This is commonly known as "scraping" a website. You already are retrieving the markup for the page, so you are off to a good start.

Here's what you need to do next:

<?php
// Load the retrieved markup into a DOM object using PHP's
// DOMDocument::loadHTML method.
    $docObj = new DOMDocument();
    $docObj->loadHTML($contents);

// Create a XPath object.
    $xpathObj = new DOMXpath($docObj);

// Query for all a tags. You can get very creative here, depending on your
// understanding of XPath. For example, you could change the query to just
// return the href attribute directly. This code returns all anchor tags in
// the page, if the href attribute ends in ".jpg".
    $elements = $xpathObj->query('//a[ends-with(@href,".jpg")]');

// Process the discovered image URL's. You could use cURL for this,
// or file_get_contents again (since your host has allow_url_fopen enabled)
// to fetch the image directly and then store it locally.
    foreach ($elements as $domNode)
    {
        $url = $domNode->getAttribute('href');
    }
?>

DOMDocument::loadHTML
XPath
XPath::query
allow_url_fopen

Upvotes: 4

Related Questions