aiternal
aiternal

Reputation: 1100

How to get a website's favicon with PHP?

I want to get, requested website's favicon with PHP. I have been recommended using Google's favicon service but it is not functional. I want to do something on my own but don't know regex usage.

I found a class on Google that works on most cases but it has unacceptable error rate. You can have a look here: http://www.controlstyle.com/articles/programming/text/php-favicon/

Can somebody please help me about getting favicon using regex, please?

Upvotes: 24

Views: 34767

Answers (15)

user1609889
user1609889

Reputation: 21

While people suggest to look for favicon.ico first, this could fail if the website has the file, but it's not the one being defined in the code. So with my logic, the function looks for the defined one in the website first.

function get_favicon($url) {
    # make the URL simpler
    $elems = parse_url($url);

    $url = $elems['scheme'] . '://' . $elems['host'];

    if(@file_get_contents($url, false, NULL, 0, 1)) { //Check the first byte to see if the URL exists
        $output = file_get_contents($url);
        $regex_pattern = "/rel=\"(shortcut )?icon\" (?:href=[\'\"]([^\'\"]+)[\'\"])?/";
        preg_match_all($regex_pattern, $output, $matches);
        if(isset($matches[2][0])) {
            $favicon = $matches[2][0];
            # check if absolute url or relative path
            $favicon_elems = parse_url($favicon);
            # if relative
            if(!isset($favicon_elems['host'])) {
                if(substr($favicon, 0, 1) == '/') $favicon = substr($favicon, 1);
                $favicon = "$url/$favicon";
            }
            return $favicon;
        }
        $favicon = "$url/favicon.ico";
        if(@file_get_contents($favicon, false, NULL, 0, 1)) {  //Check the first byte to see if favicon.ico exists
            return $favicon;
        }
        $favicon = "$url/favicon.png";
        if(@file_get_contents($favicon, false, NULL, 0, 1)) {  //Check the first byte to see if favicon.png exists
            return $favicon;
        }
    }
    return false;
}

Upvotes: 0

adilbo
adilbo

Reputation: 970

#PHP Grab Favicon

This is a comfortable way with many parameter to get the favicon from a page URL.

How it Works

  1. Check if the favicon already exists local or no save is wished, if so return path & filename
  2. Else load URL and try to match the favicon location with regex
  3. If we have a match the favicon link will be made absolute
  4. If we have no favicon we try to get one in domain root
  5. If there is still no favicon we randomly try google, faviconkit & favicongrabber API
  6. If favicon should be saved try to load the favicon URL
  7. If wished save the Favicon for the next time and return the path & filename

So it combine both ways: Try to get the Favicon from the Page and if that don't work use an "API" Service that give back the Favicon ;-)

<?php
/*

PHP Grab Favicon
================

> This `PHP Favicon Grabber` use a given url, save a copy (if wished) and return the image path.

How it Works
------------

1. Check if the favicon already exists local or no save is wished, if so return path & filename
2. Else load URL and try to match the favicon location with regex
3. If we have a match the favicon link will be made absolute
4. If we have no favicon we try to get one in domain root
5. If there is still no favicon we randomly try google, faviconkit & favicongrabber API
6. If favicon should be saved try to load the favicon URL
7. If wished save the Favicon for the next time and return the path & filename

How to Use
----------

```PHP
$url = 'example.com';

$grap_favicon = array(
'URL' => $url,   // URL of the Page we like to get the Favicon from
'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
'DIR' => './',   // Local Dir the copy of the Favicon should be saved
'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
'OVR' => false,  // Skip if file is already local (false) or overwrite (true)
'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
);

echo '<img src="'.grap_favicon($grap_favicon).'">';
```

Todo
----
Optional split the download dir into several sub-dirs (MD5 segment of filename e.g. /af/cd/example.com.png) if there are a lot of favicons.

Infos about Favicon
-------------------
https://github.com/audreyr/favicon-cheat-sheet

###### Copyright 2019-2023 Igor Gaffling

*/

$time_start = microtime(true);

/* Defaults */

$debug = null;
$consoleMode = false;
$overWrite = false;
$localPath = "./";
$saveLocal = true;
$tryHomepage = true;
$testURLs = array();

/* Detect Console Mode, can be overridden with switches */

if (php_sapi_name() == "cli") { $consoleMode = true; }

if ($consoleMode) { $script_name = basename(__FILE__); } else { $script_name = basename($_SERVER['PHP_SELF']); }

/* Command Line Options */
$shortopts  = "";
$shortopts  = "l::";
$shortopts  = "p::";
$shortopts .= "h?";

$longopts  = array(
  "list::",
  "path::",
  "user-agent::",
  "curl-timeout::",
  "tryhomepage",
  "onlyuseapis",
  "store",
  "nostore",
  "save",
  "nosave",
  "overwrite",
  "skip",
  "curl-verbose",
  "consolemode",
  "noconsolemode",
  "debug",
  "help",
);


$options = getopt($shortopts, $longopts);

if ((isset($options['help'])) || (isset($options['h'])) || (isset($options['?'])))
{
  echo "Usage: $script_name (Switches)\n\n";
  echo "--list=FILE/LIST            Filename or a delimited list of URLs to check.  Lists can be separated with space, comma or semi-colon.\n";
  echo "--path=PATH                 Location to store icons (default is $localPath)\n";
  echo "\n";
  echo "--tryhomepage               Try homepage first, then APIs.  (default is true)\n";
  echo "--onlyuseapis               Only use APIs.\n";
  echo "--store                     Store favicons locally. (default is true)\n";
  echo "--nostore                   Do not store favicons locally.\n";
  echo "--overwrite                 Overwrite local favicons (default is false)\n";
  echo "--skip                      Skip local favicons if they are already present. (default is true)\n";
  echo "--consolemode               Force console output.\n";
  echo "--noconsolemode             Force HTML output.\n";
  echo "--debug                     Enable debug messages.\n";
  echo "--user-agent=AGENT_STRING   Customize the user agent.\n";
  echo "--curl-verbose              Enable cURL verbose.\n";
  echo "--curl-timeout=SECONDS      Set cURL timeeout (default is 60).\n";
  echo "\n";
  exit;
}

/* Process Options */

$opt_list = null;
$opt_localpath = null;
$opt_usestdin = null;
$opt_tryhomepage = null;
$opt_storelocal = null;
$opt_debug = null;
$opt_console = null;
$opt_timeout = null;
$opt_curl_user_agent = null;
$opt_curl_verbose = false;
$opt_curl_timeout = null;

if (isset($options['debug'])) { $opt_debug = true; }
if (isset($options['list'])) { $opt_list = $options['list']; }
if (isset($options['path'])) { $opt_localpath = $options['path']; }
if (isset($options['l'])) { $opt_list = $options['l']; }
if (isset($options['p'])) { $opt_localpath = $options['p']; }
if (isset($options['consolemode'])) { $opt_console = true; }
if (isset($options['store'])) { $opt_storelocal = true; }
if (isset($options['save'])) { $opt_storelocal = true; }
if (isset($options['skip'])) { $overWrite = false; }
if (isset($options['tryhomepage'])) { $opt_tryhomepage = true; }
if (isset($options['nostore'])) { $opt_storelocal = false; }
if (isset($options['nosave'])) { $opt_storelocal = false; }
if (isset($options['onlyuseapis'])) { $opt_tryhomepage = false; }
if (isset($options['noconsolemode'])) { $opt_console = false; }
if (isset($options['overwrite'])) { $overWrite = true; }
if (isset($options['user-agent'])) { $opt_curl_user_agent = $options['user-agent']; }
if (isset($options['curl-verbose'])) { $opt_curl_verbose = true; }
if (isset($options['curl-timeout'])) { $opt_curl_timeout = $options['curl-timeout']; }

if (!is_null($opt_localpath)) { if (file_exists($opt_localpath)) { $localPath = $opt_localpath; } }
if (!is_null($opt_tryhomepage)) { $tryHomepage = $opt_tryhomepage; }
if (!is_null($opt_storelocal)) { $saveLocal = $opt_storelocal; }
if (!is_null($opt_debug)) { if ($opt_debug) { $debug = "debug"; } else { $debug = null; } }
if (!is_null($opt_console)) { $consoleMode = $opt_console; }
if (!is_null($opt_curl_timeout)) { if (is_numeric($opt_curl_timeout)) { if ($opt_curl_timeout >= 0 && $opt_curl_timeout < 600) { $opt_timeout = $opt_curl_timeout; } } }

if (isset($opt_list)) {
  if (file_exists($opt_list)) {
    $testURLs = file($opt_list,FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
  } else {
    if (count($testURLs) == 0) {
      $testURLs = explode(",",str_replace(array(",",";"," "),",",$opt_list));
    }
  }
}

if (is_null($opt_curl_user_agent)) { if (isset($_SERVER['SERVER_NAME'])) { $opt_curl_user_agent = 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/'; } else { $opt_curl_user_agent = 'FaviconBot/1.0/'; } }
if (strtolower($opt_curl_user_agent) != "none") { setGlobal('curl_useragent', $opt_curl_user_agent); }
setGlobal('curl_verbose', $opt_curl_verbose);
if (!is_null($opt_timeout)) { setGlobal('curl_timeout', $opt_timeout); }

if (count($testURLs) == 0) {
  $testURLs = array(
    'http://aws.amazon.com',
    'http://www.apple.com',
    'http://www.dribbble.com',
    'http://www.github.com',
    'http://www.intercom.com',
    'http://www.indiehackers.com',
    'http://www.medium.com',
    'http://www.mailchimp.com',
    'http://www.netflix.com',
    'http://www.producthunt.com',
    'http://www.reddit.com',
    'http://www.slack.com',
    'http://www.soundcloud.com',
    'http://www.stackoverflow.com',
    'http://www.techcrunch.com',
    'http://www.trello.com',
    'http://www.vimeo.com',
    'https://www.whatsapp.com/',
    'https://www.gaffling.com/',
  );
}

foreach ($testURLs as $url) {
  $grap_favicon = array(
    'URL' => $url,          // URL of the Page we like to get the Favicon from
    'SAVE'=> $saveLocal,    // Save Favicon copy local (true) or return only favicon url (false)
    'DIR' => $localPath,    // Local Dir the copy of the Favicon should be saved
    'TRY' => $tryHomepage,  // Try to get the Favicon frome the page (true) or only use the APIs (false)
    'OVR' => $overWrite,    // Overwrite existing local files or skip
    'DEV' => $debug,        // Give all Debug-Messages ('debug') or only make the work (null)
  );
  $favicons[] = grap_favicon($grap_favicon, $consoleMode);
}

foreach ($favicons as $favicon) {
  if (!empty($favicon))
  {
    if ($consoleMode) {
      echo "Icon: $favicon\n";
    } else {
      echo '<img title="'.$favicon.'" style="width:32px;padding-right:32px;" src="'.$favicon.'">';
    }
  }
}

if ($consoleMode) {
  echo "\nRuntime: ".round(microtime(true)-$time_start,2)." Sec.\n";
} else {
  echo '<br><br><tt>Runtime: '.round((microtime(true)-$_SERVER["REQUEST_TIME_FLOAT"]),2).' Sec.';
}

/*  FUNCTIONS */
function grap_favicon($options=array(), $consoleMode = false) {
  if (!$consoleMode) {
    // avoid script runtime timeout
    $max_execution_time = ini_get("max_execution_time");
    set_time_limit(0); // 0 = no timelimit
  }

  $curlTimeout = getGlobal('curl_timeout');

  // Ini Vars
  $url       = (isset($options['URL']))?$options['URL']:'gaffling.com';
  $save      = (isset($options['SAVE']))?$options['SAVE']:true;
  $directory = (isset($options['DIR']))?$options['DIR']:'./';
  $trySelf   = (isset($options['TRY']))?$options['TRY']:true;
  $DEBUG     = (isset($options['DEV']))?$options['DEV']:null;
  $overwrite = (isset($options['OVR']))?$options['OVR']:false;

  // URL to lower case
    $url = strtolower($url);

    // Get the Domain from the URL
  $domain = parse_url($url, PHP_URL_HOST);

  // Check Domain
  $domainParts = explode('.', $domain);
  if(count($domainParts) == 3 and $domainParts[0]!='www') {
    // With Subdomain (if not www)
    $domain = $domainParts[0].'.'.
              $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
  } else if (count($domainParts) >= 2) {
    // Without Subdomain
        $domain = $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
    } else {
      // Without http(s)
      $domain = $url;
    }

    // FOR DEBUG ONLY
  if ($consoleMode) {
    if($DEBUG=='debug')echo "Domain: $domain\n";
  } else {
    if($DEBUG=='debug')print('<b style="color:red;">Domain</b> #'.@$domain.'#<br>');
  }

  // If $trySelf == TRUE ONLY USE APIs
  if (isset($trySelf) && $trySelf == TRUE) {

    // Load Page
    $html = load($url, $DEBUG, $consoleMode, $curlTimeout);

    if (empty($html)) {
      if ($consoleMode) {
        if($DEBUG=='debug')echo "No data received\n";
      }
    } else {
      if ($consoleMode) {
        if($DEBUG=='debug')echo "Attempting RegEx Match\n";
      }
      // Find Favicon with RegEx
      $regExPattern = '/((<link[^>]+rel=.(icon|shortcut\sicon|alternate\sicon)[^>]+>))/i';
      if (@preg_match($regExPattern, $html, $matchTag)) {
        if ($consoleMode) {
          if($DEBUG=='debug')echo "RegEx Initial Pattern Matched\n";
          if($DEBUG=='debug')print_r($matchTag) . "\n";
        }
        $regExPattern = '/href=(\'|\")(.*?)\1/i';
        if (isset($matchTag[1]) && @preg_match($regExPattern, $matchTag[1], $matchUrl)) {
          if ($consoleMode) {
            if($DEBUG=='debug')echo "RegEx Secondary Pattern Matched\n";
          }
          if (isset($matchUrl[2])) {
            if ($consoleMode) {
              if($DEBUG=='debug')echo "Found Match, Building Link\n";
            }

            // Build Favicon Link
            $favicon = rel2abs(trim($matchUrl[2]), 'http://'.$domain.'/');

            // FOR DEBUG ONLY
            if ($consoleMode) {
              if($DEBUG=='debug')echo "Match $favicon\n";
            } else {
              if($DEBUG=='debug')print('<b style="color:red;">Match</b> #'.@$favicon.'#<br>');
            }
          } else {
            if ($consoleMode) {
              if($DEBUG=='debug')echo "Failed To Find Match\n";
            }
          }
        } else {
          if ($consoleMode) {
            if($DEBUG=='debug')echo "RegEx Secondary Pattern Failed To Match\n";
          }
        }
      } else {
        if ($consoleMode) {
          if($DEBUG=='debug')echo "RegEx Initial Pattern Failed To Match\n";
        }
      }
    }

    // If there is no Match: Try if there is a Favicon in the Root of the Domain
    if (empty($favicon)) {
      $favicon = 'http://'.$domain.'/favicon.ico';
      if ($consoleMode) {
        if($DEBUG=='debug')echo "Attempting Direct Match\n";
      }

      // Try to Load Favicon
      # if ( !@getimagesize($favicon) ) {
      # https://www.php.net/manual/en/function.getimagesize.php
      # Do not use getimagesize() to check that a given file is a valid image.
      if ($consoleMode) {
        if($DEBUG=='debug')echo "$favicon\n";
      }
      $fileExtension = geticonextension($favicon,false);
      if (is_null($fileExtension)) {
        unset($favicon);
        if ($consoleMode) {
          if($DEBUG=='debug')echo "Failed Direct Match\n";
        }
      }
    }
  } // END If $trySelf == TRUE ONLY USE APIs

  // If nothink works: Get the Favicon from API
  if ((!isset($favicon)) || (empty($favicon))) {
    if ($consoleMode) {
      if($DEBUG=='debug')echo "Attempting API Match\n";
    }

    // Select API by Random
    $random = rand(1,3);

    // Faviconkit API
    if (($random == 1) || (empty($favicon))) {
      if ($consoleMode) {
        if($DEBUG=='debug')echo "API: Selected FavIconKit\n";
      }
      $favicon = 'https://api.faviconkit.com/'.$domain.'/16';
    }

    // Favicongrabber API
    if (($random == 2) || (empty($favicon))) {
      if ($consoleMode) {
        if($DEBUG=='debug')echo "API: Selected FavIconGrabber\n";
      }
      $echo = json_decode(load('http://favicongrabber.com/api/grab/'.$domain,FALSE),TRUE);

      // Get Favicon URL from Array out of json data (@ if something went wrong)
      $favicon = @$echo['icons']['0']['src'];
    }

    // Google API (check also md5() later)
    if ($random == 3) {
      if ($consoleMode) {
        if($DEBUG=='debug')echo "API: Selected Google\n";
      }
      $favicon = 'http://www.google.com/s2/favicons?domain='.$domain;
    }

    // FOR DEBUG ONLY
    if ($consoleMode) {
      if($DEBUG=='debug')echo "API ($random): Result: $favicon\n";
    } else {
      if($DEBUG=='debug')print('<b style="color:red;">'.$random.'. API</b> #'.@$favicon.'#<br>');
    }

  } // END If nothink works: Get the Favicon from API


  // If Favicon should be saved
  if ((isset($save)) && ($save == TRUE)) {
    unset($content);

    if ($consoleMode) {
      if($DEBUG=='debug')echo "Attempting to load favicon\n";
    }

    //  Load Favicon
    $content = load($favicon, $DEBUG, $consoleMode, $curlTimeout);

    if (empty($content)) {
      if ($consoleMode) {
        if($DEBUG=='debug')echo "Failed to load favicon\n";
      }
    } else {
      // If Google API don't know and deliver a default Favicon (World)
      if (isset($random) && $random == 3 && md5($content) == '3ca64f83fdcf25135d87e08af65e68c9') {
        $domain = 'default'; // so we don't save a default icon for every domain again

        // FOR DEBUG ONLY
        if ($consoleMode) {
          if($DEBUG=='debug')echo "Google: #use default icon#\n";
        } else {
          if($DEBUG=='debug')print('<b style="color:red;">Google</b> #use default icon#<br>');
        }
      }

      //  Get Type
      if (!empty($favicon)) {
        $fileExtension = geticonextension($favicon);
        if (is_null($fileExtension)) {
          if ($consoleMode) {
            if($DEBUG=='debug')echo "Invalid File Type for $favicon\n";
          } else {
            if($DEBUG=='debug')print('<b style="color:red;">Write-File</b> #INVALID_IMAGE#<br>');
          }
        } else {
          $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.'.$fileExtension);

          //  If overwrite, delete it
          if (file_exists($filePath)) { if ($overwrite) { unlink($filePath); } }

          //  If file exists, skip
          if (file_exists($filePath)) {
            // FOR DEBUG ONLY
            if ($consoleMode) {
              if($DEBUG=='debug')echo "Skipping File $filePath\n";
            } else {
              if($DEBUG=='debug')print('<b style="color:red;">Skip-File</b> #'.@$filePath.'#<br>');
            }
          } else {
            // Write
            $fh = @fopen($filePath, 'wb');
            fwrite($fh, $content);
            fclose($fh);
            // FOR DEBUG ONLY
            if ($consoleMode) {
              if($DEBUG=='debug')echo "Writing File $filePath\n";
            } else {
              if($DEBUG=='debug')print('<b style="color:red;">Write-File</b> #'.@$filePath.'#<br>');
            }
          }
        }
      }
    }
  } else {
    // Don't save Favicon local, only return Favicon URL
    $filePath = $favicon;
  }

    // FOR DEBUG ONLY
    if ($DEBUG=='debug') {
    // Load the Favicon from local file
    if (!empty($filePath)) {
      if (!function_exists('file_get_contents')) {
        $fh = @fopen($filePath, 'r');
        while (!feof($fh)) {
          $content .= fread($fh, 128); // Because filesize() will not work on URLS?
        }
        fclose($fh);
      } else {
        $content = file_get_contents($filePath);
      }
      if ($consoleMode) {
        echo geticonextension($filePath) . " format file loaded from $filePath\n";
      } else {
      print('<b style="color:red;">Image</b> <img style="width:32px;"
             src="data:image/png;base64,'.base64_encode($content).'"><hr size="1">');
      }
    }
  }


  if (!$consoleMode) {
    // reset script runtime timeout
    set_time_limit($max_execution_time); // set it back to the old value
  }

  // Return Favicon Url
  return $filePath;

} // END MAIN Function

/* HELPER load use curl or file_get_contents (both with user_agent) and fopen/fread as fallback */
function load($url, $DEBUG, $consoleMode = false, $timeOut = 60) {
  if (function_exists('curl_version')) {
    if (!isset($timeOut)) { $timeOut = 60; }
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_USERAGENT, getGlobal('curl_useragent'));
    curl_setopt($ch, CURLOPT_VERBOSE, getGlobal('curl_verbose'));
    curl_setopt($ch, CURLOPT_TIMEOUT, $timeOut);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $content = curl_exec($ch);
    if ( $DEBUG=='debug' ) { // FOR DEBUG ONLY
      $http_code = curl_getinfo($ch);
      if ($consoleMode) {
        echo "cURL: '$url' ".$http_code['http_code']."\n";
      } else {
        print('<b style="color:red;">cURL</b> #'.$http_code['http_code'].'#<br>');
      }
    }
    curl_close($ch);
    unset($ch);
  } else {
    $context = array ( 'http' => array (
        'user_agent' => 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/)'),
    );
    $context = stream_context_create($context);
      if (!function_exists('file_get_contents')) {
      $fh = fopen($url, 'r', FALSE, $context);
      $content = '';
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($url, NULL, $context);
    }
  }
  return $content;
}

/* HELPER: Change URL from relative to absolute */
function rel2abs($rel, $base) {
    extract(parse_url($base));
    if (strpos( $rel,"//" ) === 0) return $scheme . ':' . $rel;
    if (parse_url( $rel, PHP_URL_SCHEME ) != '') return $rel;
    if ($rel[0] == '#' or $rel[0] == '?') return $base . $rel;
    $path = preg_replace( '#/[^/]*$#', '', $path);
    if ($rel[0] ==  '/') $path = '';
    $abs = $host . $path . "/" . $rel;
    $abs = preg_replace( "/(\/\.?\/)/", "/", $abs);
    $abs = preg_replace( "/\/(?!\.\.)[^\/]+\/\.\.\//", "/", $abs);
    return $scheme . '://' . $abs;
}

/* GET ICON IMAGE TYPE  */
function geticonextension($url, $noFallback = false) {
  $retval = null;
  if (!empty($url))
  {
    // If exif_imagetype is not available, it will simply return the extension
    if (function_exists('exif_imagetype')) {
      $filetype = @exif_imagetype($url);
      if ($filetype) {
        if ($filetype == IMAGETYPE_GIF) { $retval = "gif"; }
        if ($filetype == IMAGETYPE_JPEG) { $retval = "jpg"; }
        if ($filetype == IMAGETYPE_PNG) { $retval = "png"; }
        if ($filetype == IMAGETYPE_ICO) { $retval = "ico"; }
        if ($filetype == IMAGETYPE_WEBP) { $retval = "webp"; }
        if ($filetype == IMAGETYPE_BMP) { $retval = "bmp"; }
        if ($filetype == IMAGETYPE_GIF) { $retval = "gif"; }
      }
    } else {
      if (!$noFallback) { $retval = @preg_replace('/^.*\.([^.]+)$/D', '$1', $url); }
    }
  }
  return $retval;
}

function setGlobal($variable,$value = null) {
    $GLOBALS[$variable] = $value;
}

function getGlobal($variable) {
    return $GLOBALS[$variable];
}

Source: https://github.com/gaffling/PHP-Grab-Favicon

Upvotes: 0

rockstardev
rockstardev

Reputation: 13527

I find PHP Simple HTML DOM Parser to be more reliable than DOMDocument. So I use this instead:

require_once 'simple_html_dom.php';
$dom = new simple_html_dom();
$dom->load(file_get_contents($url));
$favicon = '';
foreach($dom->find('link') as $e)
{
  if (!empty($e->rel) && strtolower(trim($e->rel)) == 'shortcut icon') {
    $favicon = $url.'/'.$e->href;
  }
}

print $favicon;

Upvotes: 1

pc_
pc_

Reputation: 578

I changed a bit Vivek second method and added a this function and it looks like this:

<?php
        $website=$_GET['u'];
        $fevicon= getFavicon($website);
        echo '<img src="'.path_to_absolute($fevicon,$website).'"></img>';

            function getFavicon($site)
            {
            $html=file_get_contents($site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
            }

    // transform to absolute path function... 
    function path_to_absolute($rel, $base)
    {
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
    /* queries and anchors */
    if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
    /* parse base URL and convert to local variables:
       $scheme, $host, $path */
    extract(parse_url($base));
    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);
    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';
    /* dirty absolute URL */
    $abs = "$host$path/$rel";
    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
    /* absolute URL is ready! */
    return $scheme.'://'.$abs;
    }

?>

Of course you call it with https://www.domain.tld/favicon/this_script.php?u=http://www.example.com Still can't catch all options but now absolute path is resolved. Hope it helps.

Upvotes: 0

Vince
Vince

Reputation: 3334

See this answer : https://stackoverflow.com/a/22771267. It's an easy to use PHP class to get the favicon URL and download it, and it also gives you some informations about the favicon like file type or how the favicon was found (default URL, <link> tag...) :

<?php
require 'FaviconDownloader.class.php';
$favicon = new FaviconDownloader('https://code.google.com/p/chromium/issues/detail?id=236848');

if($favicon->icoExists){
    echo "Favicon found : ".$favicon->icoUrl."\n";

    // Saving favicon to file
    $filename = 'favicon-'.time().'.'.$favicon->icoType;
    file_put_contents($filename, $favicon->icoData);
    echo "Saved to ".$filename."\n\n";
} else {
    echo "No favicon for ".$favicon->url."\n\n";
}

$favicon->debug();
/*
FaviconDownloader Object
(
    [url] => https://code.google.com/p/chromium/issues/detail?id=236848
    [pageUrl] => https://code.google.com/p/chromium/issues/detail?id=236848
    [siteUrl] => https://code.google.com/
    [icoUrl] => https://ssl.gstatic.com/codesite/ph/images/phosting.ico
    [icoType] => ico
    [findMethod] => head absolue_full
    [error] => 
    [icoExists] => 1
    [icoMd5] => a6cd47e00e3acbddd2e8a760dfe64cdc
)
*/
?>

Upvotes: 1

Thamaraiselvam
Thamaraiselvam

Reputation: 7080

$url = 'http://thamaraiselvam.strikingly.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
@$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
if (!empty($arr[0]['href'])) {
    echo "<img src=".$arr[0]['href'].">";
 }
else 
echo "<img src='".$url."/favicon.ico'>";

Upvotes: 0

edelwater
edelwater

Reputation: 2802

Found this thread... I have written a WordPress plugin that encompasses a lot of variations on retrieving the favicon. Since there are a lot the GPL code: http://plugins.svn.wordpress.org/wp-favicons/trunk/

It lets you run a server which you can request icons from via xml rpc requests so any client can request icons. It does have a plugin structure so you can try google, getfavicon, etc... to see if one of these services delivers anything. If not then it goes into a icon fetching mode taking into account all http statusses (301/302/404) and does it best to find an icon anywhere. After this it uses image library functions to check inside the file if it is really an image and what kind of image (sometimes the extension is wrong) and it is pluggable so you can add after image conversions or extra functionality in the pipeline.

the http fetching file does some logic around what i see above: http://plugins.svn.wordpress.org/wp-favicons/trunk/includes/server/class-http.php

but it is only part of the pipeline.

can get pretty complex once you dive into it.

Upvotes: 0

Vivek
Vivek

Reputation: 1493

First Method in which we can search it from fevicon.ico if found than it will show it up else not

<?php
        $userPath=$_POST["url"];
        $path="http://www.".$userPath."/favicon.ico";
        $header=  get_headers($path);
        if(preg_match("|200|", $header[0]))
        {
            echo '<img src="'.$path.'">';
        }
        else
        {
            echo "<span class=error>Not found</span>";
        }
    ?>

In other method you can search for icon and get that icon file

    <?php
$website=$_POST["url"];
$fevicon= getFavicon($website);
echo '<img src="http://www.'.$website.'/'.$fevicon.'">';
function getFavicon($site)
{
            $html=file_get_contents("http://www.".$site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
}
?>

Upvotes: 2

Malitta N
Malitta N

Reputation: 3423

I've been doing something similar and I checked this with a bunch of URL and all seemed to work. URL doesn't have to be a base URL

function getFavicon($url){
    # make the URL simpler
    $elems = parse_url($url);
    $url = $elems['scheme'].'://'.$elems['host'];

    # load site
    $output = file_get_contents($url);

    # look for the shortcut icon inside the loaded page
    $regex_pattern = "/rel=\"shortcut icon\" (?:href=[\'\"]([^\'\"]+)[\'\"])?/";
    preg_match_all($regex_pattern, $output, $matches);

    if(isset($matches[1][0])){
        $favicon = $matches[1][0];

        # check if absolute url or relative path
        $favicon_elems = parse_url($favicon);

        # if relative
        if(!isset($favicon_elems['host'])){
            $favicon = $url . '/' . $favicon;
        }

        return $favicon;
    }

    return false;
}

Upvotes: 3

Jaime Bellmyer
Jaime Bellmyer

Reputation: 23307

I've implemented a favicon grabber of my own, and I detailed the usage in another StackOverflow post here: Get website's favicon with JS

Thanks, and let me know if it helps you. Also, any feedback is greatly appreciated.

Upvotes: 2

Starx
Starx

Reputation: 78991

Use the S2 service provided by google. It is as simple as this

http://www.google.com/s2/favicons?domain=www.yourdomain.com

Scraping this would be much easier, that trying to do it yourself.

Upvotes: 53

vooD
vooD

Reputation: 2921

Quick and dirty:

<?php 
$url = 'http://example.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
echo $arr[0]['href'];

Upvotes: 43

Marcel
Marcel

Reputation: 28087

It looks like http://www.getfavicon.org/?url=domain.com (FAQ) reliably scrapes a website's favicon. I realise it's a 3rd-party service but I think it's a worthy alternative to the Google favicon service.

Upvotes: 5

mdec
mdec

Reputation: 5242

According to Wikipedia, there are 2 major methods which can be used by websites to have a favicon picked up by a browser. The first is as Steve mentioned, having the icon stored as favicon.ico in the root directory of the webserver. The second is to reference the favicon via the HTML link tag.

To cover all of these cases, the best idea would be to test for the presence of the favicon.ico file first, and if it is not present, search for either the <link rel="icon" or <link rel="shortcut icon" part in the source (limited to the HTML head node) until you find the favicon. It is up to you whether you choose to use regex, or some other string search option (not to mention the built in PHP ones). Finally, this question may be of some help to you.

Upvotes: 3

Steve Mayne
Steve Mayne

Reputation: 22818

If you want to retrieve the favicon from a particular website, you simply need to fetch favicon.ico from the root of their website. Like so:

$domain = "www.example.com";
$url = "http://".$domain."/favicon.ico";
$icondata = file_get_contents($url);

... you can now do what you like with the icon data

Upvotes: -1

Related Questions