Joe
Joe

Reputation: 1055

javascript regex negation detect Url NOT containing given domain

I need to check some html files and extract the urls that are not referred to 2 websites

after many tests I got this

/(http|https)?:?(\/\/)\w*\.*\-*[^(mysite.com)]\w*\.?\S*/igm

that works not bad.. but not perfectly:

for example, as can see HERE on regexr.com it matches

// End

but not

www.demo.com

while should be the countrary, but adding a ? after (\/\/) it becomes an unusful "catch all"

and if url has a " at beginning and at the end, and this clearly happens frequently does not grab starting " (correctly) but grab ending one (wrong)

finally it should not match also theothermysite.net but do well understood how to handle OR with Negation :-(

can help please?

Joe

Upvotes: 0

Views: 57

Answers (1)

Fabian N.
Fabian N.

Reputation: 3856

Like this?

/((http|https):(\/\/)|www\.)\w*\.*\-*[^(mysite.com)(theothermysite.net)]\w*\.?[^\s\t\r\n\"]*/igm

I just added a "or www", replaced \S with its components plus \" and added another atomic group to the negation like you already did with mysite.com

Upvotes: 1

Related Questions