Junho Lee
Junho Lee

Reputation: 328

Regex match url of image

I want to match image url that start from "//" and end with ".jpg" or ".png" or "gif". so I made following regular expressions and it works but not all cases..

var pattern = /\/{2}.+?\.(jpg|png|gif)/gm;

The problem is, It also matches something that looks like this,

//pm.pstatic.net/dist/css/nmain.20201119.css"> <link rel="apple-touch-icon-precomposed" sizes="114x114" href="https://s.pstatic.net/static/www/u/2014/0328/mma_204243574.png

And this is not what I want obiously. I need to match last occurence of "//" and lazy match of ".png" or ".jpg" or "gif". In this case, It will be //s.pstatic.net/static/www/u/2014/0328/mma_204243574.png

What should I use to solve this problem?

+edit

The website that I want to scrape contains image url something looks like this.

<a href="javascript:;" style="background:url(//gd4.alicdn.com/imgextra/i4/2748816012/O1CN01gbXzeB1uHXhQ9eTVd_!!2748816012.jpg_30x30.jpg)

so normal image url matcher doesn't work.

also, It must be lazy match of ".jpg" because as you can see above url, it has image address like //gd4.alicdn.com/imgextra/i4/2748816012/O1CN01gbXzeB1uHXhQ9eTVd_!!2748816012.jpg_30x30.jpg

it needs to be end at first occurrence of ".jpg", otherwise I will only scrape 30x30 small image which I don't want. In this case, img url that I want is, //gd4.alicdn.com/imgextra/i4/2748816012/O1CN01gbXzeB1uHXhQ9eTVd_!!2748816012.jpg

Upvotes: 5

Views: 3131

Answers (2)

anubhava
anubhava

Reputation: 786241

You may try this regex:

/\/\/(\S+?(?:jpe?g|png|gif))/ig

RegEx Demo

RegEx Details:

  • \/\/: Match //
  • (: Start capture group #1
  • \S+?: Match 1+ non-whitespaces (lazy)
  • (?:jpe?g|png|gif): Match jpg, jpeg, png or gif
  • ): End capture group

Upvotes: 3

aryashah2k
aryashah2k

Reputation: 538

You can try the following regex:

(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|gif|png)

Also, you can test your Regex here:

https://regex101.com/r/l2Zt7S/1

Just for fun, here's a Regex that matches all types of image urls:

^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+(?:png|jpg|jpeg|gif|svg)+$

What intrigues me is how to select the last occurrence of "//". But let's see if someone comes up with a way to solve that.

Here's the match that I got when I tested my Regex with the URL you shared.

Match Description Image

Upvotes: 3

Related Questions