dyip1
dyip1

Reputation: 257

How do sites like Bing Search, Imgur, and Reddit generate a thumbnail of the website from a URL?

In Imgur, you can input an image URL and a few seconds later, there's a thumbnail of the image. Or in Bing Search, you can (or used to) be able to view a thumbnail of the website in the search results before visiting it.

I would love to implement something similar for my website, but I can't wrap my head around on how it is done. Moreover, are there not security concerns? I'd imagine the servers have to at least download the website, render it and take a screenshot. What if it's a malicious website, and you download something malicious on your server?

Upvotes: 3

Views: 807

Answers (2)

Vasiliy Faronov
Vasiliy Faronov

Reputation: 12310

A headless Web browser engine like PhantomJS can be used for this. See example on their wiki. Yes, it would be prudent to run this in some sort of a sandbox, feeding a queue of URLs into it, then taking the generated thumbnails from the file system.

Upvotes: 2

Yuriy Babenko
Yuriy Babenko

Reputation: 178

While I don't know the internal workings of any of the aforementioned services, I'd guess that they download/create a local copy of the images and generate a thumbnail from that.

Imgur, as an image hosting service, definitely needs a copy of the image prior to being able to generate thumbnails or anything else from it. The image may be stored locally or just in memory, but either way, it must be downloaded.

The search engines displaying screenshots of the sites likely have services that periodically take a screenshot of the viewable area when the content is getting indexed, and then serve those screenshots (or derivatives) along with the search results. Taking a screenshot really isn't dangerous, so there's nothing to worry about there, and whatever tools are used to load/parse/index the websites will obviously be written with security considerations in mind.

Of course, there are security concerns about the data you're downloading, too; the images can easily contain executable code (such as PHP) in their EXIF data, so you need to be careful about what you do with the images and how.

Upvotes: 0

Related Questions