ben
ben

Reputation: 29777

If a URL doesn't have a filename in it, can I determine if it is leading to an image?

This URL takes you to an image, but has no file extension to check a regex against:

http://www.tonymooreillustration.com/gallery/main.php?g2_view=core.DownloadItem&g2_itemId=393

I'm using Restclient (HTTP and REST client for Ruby) in my app, so I tried doing this:

RestClient.get "http://www.tonymooreillustration.com/gallery/main.php?g2_view=core.DownloadItem&g2_itemId=393"

I get back lots of text that begins like this:

"\377???JFIF\000\001\002\001\000H\000H\000\000\377?cExif\000\000MM\000*\000\000\000\b\000\a\001\022\000\003\000\000\000\001\000\001\000\000\001\032\000\005\000\000\000\001\000\000\000b\001\e\000\005\000\000\000\001\000\000\000j\001(\000\003\000\000\000\001\000\002\000\000\0011\000\002\000\000\000\024\000\000\000r\0012\000\002\000\000\000\024\000\000\000\206\207i\000\004\000\000\000\001\000\000\000\234\000\000\000?\000\000H\000\000\000\001\000\000\000H\000\000\000\001Adobe Photoshop 7.0\0002005:07:12 02:58:19\000\000\000\000\003\240\001\000\003\000\000\000\001\377\377\000\000\240\002\000\004\000\000\000\001\000\000\001?\000\004\000\000\000\001\000\000\002?\000\000\000\000\000\006\001\003\000\003\000\000\000

Is there a way I can use this to determine if the URL is pointing at an image?

Upvotes: 2

Views: 401

Answers (5)

dkam
dkam

Reputation: 3916

Use FastImage - it'll grab the minimum require data from the URL to determine if it's an image, what type of image and the size.

Upvotes: 0

Johannes Gorset
Johannes Gorset

Reputation: 8785

Your best bet is the Content-Type header, but if all else fails you can derive the image format from the initial set of bytes:

  • JPG: 0xFF 0xD8
  • PNG: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
  • GIF: 'G' 'I' 'F'

Search for <format> file format, replacing <format> with the other file formats you need to identify.

Upvotes: 1

Geuis
Geuis

Reputation: 42267

I did this about 5 years ago in php. Sadly I don't have the code any more. Basically I used curl with an option to follow all redirects. That way the data that was being returned to the program was nearly always what I actually wanted to test. From that point, I would only grab the first few bytes of data from the content and check if the image meta data existed and whether or not it was jpg, png, or gif. Having the code to show you would probably help to explain this a lot better, but its gone. I likened this to "tasting" a file before eating it.

The advantage of using this kind of technique is that you're actually checking the file instead of relying on header info, which is usually correct but not always.

Upvotes: 0

mikej
mikej

Reputation: 66263

It looks like the REST Client response wraps Ruby's Net::HTTPResponse so if res is the result from RestClient.get you should be able to do:

res.net_http_res.header['content-type']

and see if this starts with image/ e.g. image/jpeg for a JPEG image.

If you don't actually need a copy of the image and just need to check what the URL points to then you are better to do a HEAD request as reto suggests. (this avoids receiving an unnecessary copy of the body content.)

Upvotes: 2

reto
reto

Reputation: 16732

You could do a HEAD request and check the header for MIME information.

See: http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000682

The response you get in your example is the image itself. You also try do determine wether or not this is a picture by using a utility like file [1] or with image library like imagemagick [2].

[1] http://unixhelp.ed.ac.uk/CGI/man-cgi?file [2] http://rmagick.rubyforge.org/

Upvotes: 2

Related Questions