Chris Favaloro
Chris Favaloro

Reputation: 79

PHP scrape remote images that do not have extensions

I've developed an image scraper that will scrape specific images from remote sites and display them upon pasting into a text field. The logic includes finding images that end in .jpg .jpeg . png etc.

I'm running into an issue where alot of sites will generate images via javascript and or not have the image extension as part of the displayed image. Example sites like

www.express.com and www.underarmour.com have this issue and many more.

What function could I use to find images from a set URL and then display them accordingly that do not have a file extension?

Thanks again.

Upvotes: 0

Views: 257

Answers (2)

apfelbox
apfelbox

Reputation: 2634

I think, you have two options:

  1. Generate some heuristics, whether a URL could be an image (like finding a part /images/ in the URL)

  2. Load every URL and check, whether the returned data is an image (using for example getimagesize())

The second version is more generalized, but quite heavy on both bandwidth and resources.

Upvotes: 1

Marc B
Marc B

Reputation: 360812

unless the url comes from <img src="...">, there is NO way to tell what you'll get from a particular url. http://example.com/index.html could very well actually be a PHP script that serves up a zip file.

It is IMPOSSIBLE to reliably tell what a url will give you until you actually hit the url and check the headers + downloaded data.

Upvotes: 1

Related Questions