Reputation: 328
I want to match image url that start from "//" and end with ".jpg" or ".png" or "gif". so I made following regular expressions and it works but not all cases..
var pattern = /\/{2}.+?\.(jpg|png|gif)/gm;
The problem is, It also matches something that looks like this,
//pm.pstatic.net/dist/css/nmain.20201119.css"> <link rel="apple-touch-icon-precomposed" sizes="114x114" href="https://s.pstatic.net/static/www/u/2014/0328/mma_204243574.png
And this is not what I want obiously. I need to match last occurence of "//" and lazy match of ".png" or ".jpg" or "gif". In this case, It will be //s.pstatic.net/static/www/u/2014/0328/mma_204243574.png
What should I use to solve this problem?
+edit
The website that I want to scrape contains image url something looks like this.
<a href="javascript:;" style="background:url(//gd4.alicdn.com/imgextra/i4/2748816012/O1CN01gbXzeB1uHXhQ9eTVd_!!2748816012.jpg_30x30.jpg)
so normal image url matcher doesn't work.
also, It must be lazy match of ".jpg" because as you can see above url, it has image address like //gd4.alicdn.com/imgextra/i4/2748816012/O1CN01gbXzeB1uHXhQ9eTVd_!!2748816012.jpg_30x30.jpg
it needs to be end at first occurrence of ".jpg", otherwise I will only scrape 30x30 small image which I don't want. In this case, img url that I want is, //gd4.alicdn.com/imgextra/i4/2748816012/O1CN01gbXzeB1uHXhQ9eTVd_!!2748816012.jpg
Upvotes: 5
Views: 3131
Reputation: 786241
You may try this regex:
/\/\/(\S+?(?:jpe?g|png|gif))/ig
RegEx Details:
\/\/
: Match //
(
: Start capture group #1\S+?
: Match 1+ non-whitespaces (lazy)(?:jpe?g|png|gif)
: Match jpg
, jpeg
, png
or gif
)
: End capture groupUpvotes: 3
Reputation: 538
You can try the following regex:
(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|gif|png)
Also, you can test your Regex here:
https://regex101.com/r/l2Zt7S/1
Just for fun, here's a Regex that matches all types of image urls:
^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+(?:png|jpg|jpeg|gif|svg)+$
What intrigues me is how to select the last occurrence of "//". But let's see if someone comes up with a way to solve that.
Here's the match that I got when I tested my Regex with the URL you shared.
Upvotes: 3