Cory Trentini
Cory Trentini

Reputation: 97

Regex: How to parse images not only the <img> tags

For the last whole week I have been busting my head to find a Regular Expression which could parse all the images in a html source file. I know that there are many out there but mainly they parse the tags. The tricky part is sometimes the images are in javascript and sometimes they have weird long formats such as :

http://pinterest.com/pin/create/button/?url=http://www.designscene.net/2015/07/binx-walton-josephine-le-tutour-vera-wang.html&media=http://www.designscene.net/wp-content/uploads/2015/07/Vera-Wang-Fall-Winter-2015-Patrick-Demarchelier-03-620x806.jpg&description=Binx Walton and Josephine Le Tutour for Vera Wang FW15

I have tried negative look heads and booleans but could not find a good solution. Please give me a perspective.

Upvotes: 0

Views: 63

Answers (3)

Kristian Nordman
Kristian Nordman

Reputation: 1

You should be able to search for any url that ends in an image-extension. This quick and dirty expression should do it

(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})[\/\w \.-]*(jpg|png|gif|jpeg|tif|tiff)

Available at: http://regexr.com/3bg8o

Upvotes: 0

zolo
zolo

Reputation: 469

Would this help? https://regex101.com/r/jP4tV7/4

(http[^&"']+(?:jpg|gif|jpeg|png))(?:\&|'|")

Upvotes: 0

Koray Birand
Koray Birand

Reputation: 1986

Well as you said there are many ways to do that and to be honest there is not a regex solution which could parse all the html files out there.. I have tried it in the past as well. For me the below worked the best :

/(?:.(?!http|\,))+(\.jpg|\.png)

A bit of an explanation:

/......(.jpg|.png) starts from the first slash it finds until finds an image ext . any char between the slash and the ext (?:.(?!http|\,))+ omit if there is http or , in it (works like a charm for the example link you have given

Hope it helps, regex is a very complex world. You can write the same exp in so many different ways. May be there is a better solution then I suggest.

Upvotes: 1

Related Questions