Scrape FULL image src with PHP

Question

I am trying to scrape img src's with php, I can get the src fine, but if the src does not include the full path then I can't really reuse it. Is there a way to grab the full path of the image using php (browsers can get it if you use the right click menu).

ie. How do I get a FULL path including the domain in one of the following two examples?

src="../foo/logo.png"
src="/images/logo.png"

Thanks,

Allan

mpen · Accepted Answer

You don't need a regex... just some patience. I don't really want to write the code for you, but just check if the src starts with http://, and if not, you have like 3 different cases.

If it begins with a / then prepend http://domain.com
If it begins with .. you'll have to split the full URL and hack off pieces until the src starts with a /
Else (it begins with a letter), the take the full domain, and strip it down to the last slash then append the src URL.

Or.... be lazy and steal this script

$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";

$com = InternetCombineURL($url,$rel);

//  Returns http://www.goat.com/images/cheese.jpg

function InternetCombineUrl($absolute, $relative) {
    $p = parse_url($relative);
    if($p["scheme"])return $relative;
    
    extract(parse_url($absolute));
    
    $path = dirname($path); 

    if($relative{0} == '/') {
        $cparts = array_filter(explode("/", $relative));
    }
    else {
        $aparts = array_filter(explode("/", $path));
        $rparts = array_filter(explode("/", $relative));
        $cparts = array_merge($aparts, $rparts);
        foreach($cparts as $i => $part) {
            if($part == '.') {
                $cparts[$i] = null;
            }
            if($part == '..') {
                $cparts[$i - 1] = null;
                $cparts[$i] = null;
            }
        }
        $cparts = array_filter($cparts);
    }
    $path = implode("/", $cparts);
    $url = "";
    if($scheme) {
        $url = "$scheme://";
    }
    if($user) {
        $url .= "$user";
        if($pass) {
            $url .= ":$pass";
        }
        $url .= "@";
    }
    if($host) {
        $url .= "$host/";
    }
    $url .= $path;
    return $url;
}

From http://www.web-max.ca/PHP/misc_24.php

Scrape FULL image src with PHP

Answers (2)

Related Questions