Reputation: 4703
I'm trying to create a regex that will match a markdown urls, but ignore the content that comes before and after it. It should match only local markdown urls which point to local files and ignore ones that point to external websites. Example:
"dddd [link which should be ignore](http://google.com/) lorem ipsum lorem ips sum loreerm [link which shouldn't be ignored](../../../filepath/folder/some-other-folder/another-folder/one-last-folder/file-example.html). lorem ipsum lorem"
Should only match the second link. Currently, it matches everything. My regex works for what I need, but this seems to be the major edge case I've found.
What I have so far:
/(!?\[.*?\]\((?!.*?http)(?!.*?www\.)(?!.*?#)(?!.*?\.com)(?!.*?\.net)(?!.*?\.info)(?!.*?\.org).*?\))/g
Currently, this ignores the first link and matches the second link IF the second link doesn't come after the first link. Otherwise, it matches everything from the first to the second.
I'm using JavaScript, which doesn't support negative lookbehinds. Any suggestions?
Upvotes: 2
Views: 2732
Reputation: 89614
Testing if an url is local or external is not a job for regex. As you can see with the third link in the example string, testing if the uri contains .org
, .com
, http
, #
or whatever is just wrong.
This code shows how to know if a url is local or not in a replacement context on client side:
var text = '[external link](http://adomain.com/path/file.txt) ' +
'[local link](../path/page.html) ' +
'[local link](../path.org/http/file.com.php#fragment)';
text = text.replace(/\[([^\]]*)\]\(([^)]*)\)/g, function (_, g1, g2) {
var myurl = document.createElement('a');
myurl.href = g2;
return window.location.hostname == myurl.hostname ? "locrep" : "extrep";
});
console.log(text);
Upvotes: 0
Reputation:
There are two problems.
\[.*?\]
will blow past ]
and match [link which should be ignore](http://google.com/) lorem ipsum lorem ips sum loreerm [link which shouldn't be ignored]
just so it will match the assertions. You can fix 1 & 2 with this regex
((!?\[[^\]]*?\])\((?:(?!http|www\.|\#|\.com|\.net|\.info|\.org).)*?\))
( # (1 start)
( !?\[ [^\]]*? \] ) # (2), Link
\( # Open paren (
(?: # Cluster
(?! # Not any of these
http
| www\.
| \#
| \.com
| \.net
| \.info
| \.org
)
. # Ok, grab this character
)*? # End cluster, do 0 to many times
\) # Close paren )
) # (1 end)
Metrics
----------------------------------
* Format Metrics
----------------------------------
Cluster Groups = 1
Capture Groups = 2
Assertions = 1
( ? ! = 1
Free Comments = 7
Character Classes = 1
Upvotes: 2