Reputation: 97
For the last whole week I have been busting my head to find a Regular Expression which could parse all the images in a html source file. I know that there are many out there but mainly they parse the tags. The tricky part is sometimes the images are in javascript and sometimes they have weird long formats such as :
http://pinterest.com/pin/create/button/?url=http://www.designscene.net/2015/07/binx-walton-josephine-le-tutour-vera-wang.html&media=http://www.designscene.net/wp-content/uploads/2015/07/Vera-Wang-Fall-Winter-2015-Patrick-Demarchelier-03-620x806.jpg&description=Binx Walton and Josephine Le Tutour for Vera Wang FW15
I have tried negative look heads and booleans but could not find a good solution. Please give me a perspective.
Upvotes: 0
Views: 63
Reputation: 1
You should be able to search for any url that ends in an image-extension. This quick and dirty expression should do it
(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})[\/\w \.-]*(jpg|png|gif|jpeg|tif|tiff)
Available at: http://regexr.com/3bg8o
Upvotes: 0
Reputation: 469
Would this help? https://regex101.com/r/jP4tV7/4
(http[^&"']+(?:jpg|gif|jpeg|png))(?:\&|'|")
Upvotes: 0
Reputation: 1986
Well as you said there are many ways to do that and to be honest there is not a regex solution which could parse all the html files out there.. I have tried it in the past as well. For me the below worked the best :
/(?:.(?!http|\,))+(\.jpg|\.png)
A bit of an explanation:
/......(.jpg|.png) starts from the first slash it finds until finds an image ext . any char between the slash and the ext (?:.(?!http|\,))+ omit if there is http or , in it (works like a charm for the example link you have given
Hope it helps, regex is a very complex world. You can write the same exp in so many different ways. May be there is a better solution then I suggest.
Upvotes: 1