Reputation: 79
I've developed an image scraper that will scrape specific images from remote sites and display them upon pasting into a text field. The logic includes finding images that end in .jpg .jpeg . png etc.
I'm running into an issue where alot of sites will generate images via javascript and or not have the image extension as part of the displayed image. Example sites like
www.express.com and www.underarmour.com have this issue and many more.
What function could I use to find images from a set URL and then display them accordingly that do not have a file extension?
Thanks again.
Upvotes: 0
Views: 257
Reputation: 2634
I think, you have two options:
Generate some heuristics, whether a URL could be an image (like finding a part /images/
in the URL)
Load every URL and check, whether the returned data is an image (using for example getimagesize()
)
The second version is more generalized, but quite heavy on both bandwidth and resources.
Upvotes: 1
Reputation: 360812
unless the url comes from <img src="...">
, there is NO way to tell what you'll get from a particular url. http://example.com/index.html
could very well actually be a PHP script that serves up a zip file.
It is IMPOSSIBLE to reliably tell what a url will give you until you actually hit the url and check the headers + downloaded data.
Upvotes: 1