xarhx
xarhx

Reputation: 11

regex - match some urls

I'm using the following to pick up all https or ftp from within a large string

/(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim;

I want to extend the functionality so as NOT to pick up any URL's that have a preceding src=" tag

Match: https://xxx.yyy.com

No Match: src="https://xxx.yyy.com

I've tried the negative look behind trying to match src=" with no success.

Upvotes: 1

Views: 78

Answers (2)

Dmitry Egorov
Dmitry Egorov

Reputation: 9650

Lookbehinds are not supported in JavaScript. Yet you may solve this by explicitly matching the src=" in an optional group and then filter out all matches with that group matched:

var input = `Match: https://match.xxx.yyy.com
     No Match: src="https://fail.xxx.yyy.com`;
var regex = /(src=")?\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/gim;
var urls = [];

// collect only matches without `src="` prefix
input.replace(regex, function(match, src) { if (!src) {urls.push(match)} });

console.log(urls);

Upvotes: 0

Aleksandar Makragić
Aleksandar Makragić

Reputation: 1997

JavaScript regular expressions do not support lookbehinds.

One common way you could match strings like this is:

[^"]https:\/\/[a-z.]+

Although you should write more detailed regex for domain, and then simply skip first character to get URL. You can see here regex demo.

Upvotes: 1

Related Questions