Banshee
Banshee

Reputation: 15817

Getting the thumbnail image from a Web page

I have C# code for fetching images from URLs like http://i.imgur.com/QvkaduU.jpg but how would I fetch the image from Web pages like this:http://imgur.com/gallery/QvkaduU?

Is there any "easy" way to do this or I will have to fetch the HTML and construct a C# parser that looks in HTML for images that are bigger than all the others?

Let me clear this up. If you paste http://imgur.com/gallery/QvkaduU (HTML version) into for example Facebook's status update field it will find the main image and make a thumbnail out of it, this is exactly the behavior I'm looking for. The question is, how is this done? Do I have to write my own HTML parser or is there an easy way to get this?

Upvotes: 6

Views: 5093

Answers (6)

Chibueze Opata
Chibueze Opata

Reputation: 10054

You're already on the right track, yes the most reliable way would be to fetch the HTML, parse it and look for images, you would then rank the images based on position and size. For instance, if the first image you find is big enough to make the thumbnail, then cool, if however it is small, you go to the next image, etc. It would be most advisable to use an image plugin like Timthumb (I think I've seen an ASP.NET version sometime) and cache the images such that once you've looked up the thumbnail to represent a website, you can call the image(s) from the catch instead.

Upvotes: 1

Martin Braun
Martin Braun

Reputation: 12609

I would fetch the whole HTML source and put all <img ... src="..."> parameters as well as < ... style="... background-image: ...;"> css inline properties using regex and try to download all files behind the links temporary. Then I would (try to convert it to Bitmap and) check the pixel size, the largest picture should be the picture you want.

Google might help you how to check pixel size and convert any images.

The regex to get all image links from a HTML source should be

<img[^>]+src=\"([^"]+)\".*?>|<[^>]+style=\"[^"]*background-image:\s*url\(\s*'?([^')])\s*'?)\s*;.*?> (not tested, but pretty sure)

The result will be in the 2nd or 3rd group index, also don't forget to prefix the current url on relative links.

Upvotes: 1

Rich
Rich

Reputation: 15465

There is no easy way to get a "good" thumbnail image for an arbitrary URL.

Facebook's algorithm for doing so is fairly complex. Page developers are able to give it a hint by adding various meta tags to the <head>, including:

<meta property="og:image" content="http://url_to_your_image_here" />

or

<link rel="image_src" href="http://www.code-digital.co.uk/preview.jpg" />

(more on this)

... so if you wanted to replicate Facebook's algorithm, you would need to fetch the page source, parse it for any "hints" like the one above (you'd better check that I haven't missed any other "hint" formats), and come up with a fallback algorithm if the page doesn't include one of those.

A more realistic solution would be to use someone else's URL -> thumbnail system.

If you like Facebook's version, I think you should be able to request Facebook's thumbnail for a given URL via their API.

Other services which offer this sort of thing are:

Upvotes: 7

overflowedstack
overflowedstack

Reputation: 31

If the QvkaduU part is always the same between the html page and the image, could you just do a string replacement?

"https://i.sstatic.net/6YbCj.jpg".Replace("imgur.com/gallery","i.imgur.com") + ".jpg";

Upvotes: 1

Jhigs
Jhigs

Reputation: 147

can you try this

public Bitmap getImageFromURL(String sURL)
{
    HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(sURL);
    myRequest.Method = "GET";
    HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse();
    System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(myResponse.GetResponseStream());
    myResponse.Close();

    return bmp;
}

gotten from How to get an image to a pictureBox from an URL? (Windows Mobile)

Upvotes: 0

PM_ME_YOUR_CODE
PM_ME_YOUR_CODE

Reputation: 321

Can you try to do something like this?

public void ProcessRequest(HttpContext context)
    {
      {
            // load here the image 
            ....
            // and send it to browser
            ctx.Response.OutputStream.Write(imageData, 0, imageData.Length);
       }
    }

You can also try what they are talking about here. I tried it and it worked like a charm.

http://www.dotnetspider.com/resources/42565-Download-images-from-URL-using-C.aspx

Upvotes: 0

Related Questions