Reputation: 152206
I am wondering if it is possible to get a website's favicon by a URL with JavaScript
.
For example, I have the URL http://www.bbc.co.uk/
and I would like to get the path to the favicon described in <link rel="icon" .../>
meta tag - http://www.bbc.co.uk/favicon.ico
.
I have many URLs so that should not load every page and search for link
tag I think.
Any ideas?
Upvotes: 41
Views: 34558
Reputation: 41
I think a small update on what techniques to get Favicons are still working is interesting here.
As far as I know, the Google and Duckduckgo solutions are still working.
The accepted solution using YQL is not working in all cases. It is looking for a <link>
tag but favicons are way more complex than that. For instance you can have default /favicon.ico
without any reference into the code. Other example, the <base>
HTML define a default base URL to all relative links in the page including favicons, and so on. You can find more on the different techniques to define a favicon here.
I would recommend using an existing library but usually they are not exhaustive. Here is a project in Javascript: favicongrabber.com. But In my experience libraries in other languages are more exhaustive:
Upvotes: 0
Reputation: 2325
For the records, DuckDuckGo also has 'hidden' favicon service:
https://icons.duckduckgo.com/ip3/www.google.com.ico
https://external-content.duckduckgo.com/ip3/www.google.com.ico
Upvotes: 0
Reputation: 3048
These days I thought that GitHub's service did a much better job than Google's:
https://favicons.githubusercontent.com/microsoft.com
Though neither are perfect it seems. For stackoverflow:
For GitHub:
Here is an article I wrote about a solution that can fetch favicons from multiple source.
Here is the source code:
<!DOCTYPE html>
<html>
<body style="background-color:grey;">
<script type="text/javascript">
const KRequestFaviconGitHub = 'https://favicons.githubusercontent.com/';
const KRequestFaviconGoogle = 'https://www.google.com/s2/favicons?domain=';
const KDefaultUrl = KRequestFaviconGoogle;
// We rely on pre-defined hostname configurations
const hostnames = {
"stackoverflow.com": { url:KRequestFaviconGoogle+"stackoverflow.com", invert:0 },
"theregister.co.uk": { url:KRequestFaviconGoogle+"theregister.co.uk", invert:1 },
"github.com": { url:KRequestFaviconGitHub+"github.com", invert:1 },
"android.googlesource.com": { url:KRequestFaviconGoogle+"googlesource.com", invert:0 },
"developer.android.com": { url:KRequestFaviconGitHub+"developer.android.com", invert:0 }
};
document.addEventListener('DOMContentLoaded', function(event) {
addFavicon("stackoverflow.com");
addFavicon("bbc.co.uk");
addFavicon("github.com");
addFavicon("theregister.co.uk");
addFavicon("developer.android.com");
addFavicon("android-doc.github.io");
addFavicon("slions.net");
addFavicon("alternate.de");
addFavicon("amazon.de");
addFavicon("microsoft.com");
addFavicon("apple.com");
addFavicon("googlesource.com");
addFavicon("android.googlesource.com");
addFavicon("firebase.google.com");
addFavicon("play.google.com");
addFavicon("google.com");
addFavicon("team-mediaportal.com");
addFavicon("caseking.de");
addFavicon("developer.mozilla.org");
addFavicon("theguardian.com");
addFavicon("niche-beauty.com");
addFavicon("octobre-editions.com");
addFavicon("dw.com");
addFavicon("douglas.com");
addFavicon("douglas.de");
addFavicon("www.sncf.fr");
addFavicon("paris.fr");
addFavicon("bahn.de");
addFavicon("hopfully.that.domain.does.not.exists.nowaythisisavaliddomain.fart");
});
/**
*
*/
function addFavicon(aDomain)
{
var a = document.createElement("a");
a.href = "http://" + aDomain;
//a.style.display = "block";
var div = document.createElement("div");
div.innerText = aDomain;
div.style.verticalAlign = "middle";
div.style.display = "inline-block";
var img = document.createElement("img");
img.className = "link-favicon";
img.style.width = "16px";
img.style.height = "16px";
img.style.verticalAlign = "middle";
img.style.display = "inline-block";
img.style.marginRight = "4px";
a.prepend(img);
a.appendChild(div);
document.body.appendChild(a);
document.body.appendChild(document.createElement("p"));
const conf = hostnames[aDomain]
if (conf==null)
{
img.src = KDefaultUrl+aDomain;
}
else
{
img.src = conf.url;
img.style.filter = "invert(" + conf.invert + ")";
}
}
</script>
</body>
</html>
Upvotes: 0
Reputation: 40525
Here are 2 working options, I tested over 100 urls and got different results which each option.
Please note, this solution is not JS
, but JS
may not be necessary.
<!-- Free -->
<img height="16" width="16" src='http://www.google.com/s2/favicons?domain=www.edocuments.co.uk' />
<!-- Paid -->
<img height="16" width="16" src='http://grabicon.com/edocuments.co.uk' />
Upvotes: 59
Reputation: 2802
After 30.000 to 40.000 tests I noticed that you really encounter lots of different situations which have to be worked against.
The starting point is ofcourse somewhere to only look at the rel tag in there and fetch this, but along the way you will find more and more situations you will have to cover.
In case anyone will look at this thread and tries to come closer to 100% perfection I uploaded my (PHP) code here: https://plugins.svn.wordpress.org/wp-favicons/trunk/includes/server/class-http.php. This is part of a (GPL) WordPress Plugin that retrieves Favicons, more or less on request back then, out of limitations of the standard Google one (as mentioned above). The code finds a substantially amount more icons that the code of Google. But also includes google and others as image providers to shortcut further iterations on trying to retrieve the icon.
When you read through the code you will probably see some situations that you will encounter e.g. base64 data uris, pages redirecting to 404 pages or redirecting a gazillion times, retrieving weird HTTP status codes and having to check every possible HTTP return code for validness, the icons themselves that have a wrong mime type, client side refresh tags, icons in the root folder and none in the html code, etc... etc... etc...
If you go up a directory you will find other classes that then are ment to store the actual icons against their url (and ofcourse you will then need to find out which "branches" use the same favicon and which not, and find out if they belong to the same "owner" or are really different parts but under the same domain.
Upvotes: 1
Reputation: 15045
You could use YQL for that
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D"http://bbc.co.uk/"and%20xpath%3D"/html/head/link[@rel%3D'icon']%20|%20/html/head/link[@rel%3D'ICON']%20|%20/html/head/link[@rel%3D'shortcut%20icon']%20|%20/html/head/link[@rel%3D'SHORTCUT%20ICON']"&format=json&callback=grab
This query used by Display Feed Favicons Greasemonkey script.
You can write queries in YQL console, but it requires to login (btw, using queries don't):
http://developer.yahoo.com/yql/console/#h=select%20*%20from%20html%20where%20url%3D%22http%3A//bbc.co.uk/%22and%20xpath%3D%22/html/head/link%5B@rel%3D%27icon%27%5D%20%7C%20/html/head/link%5B@rel%3D%27ICON%27%5D%20%7C%20/html/head/link%5B@rel%3D%27shortcut%20icon%27%5D%20%7C%20/html/head/link%5B@rel%3D%27SHORTCUT%20ICON%27%5D%22
It is better than http://www.google.com/s2/favicons?domain=www.domain.com
, in case favicon exists, but doesn't located in domain.com/favicon.ico
Upvotes: 17
Reputation: 152206
Suddenly I found something called Google Shared Stuff
that returns image with website's favicon by hostname:
http://www.google.com/s2/favicons?domain=www.domain.com
But fot BBC site it returns favicon a bit small. Compare:
http://www.google.com/s2/favicons?domain=www.bbc.co.uk
http://www.bbc.co.uk/favicon.ico
Upvotes: 28