Reputation: 1251
I have text of a form where there are paragraphs of text with urls interspersed. I would like to parse the string creating html links from the urls and using the following text as the descriptive link text i.e.
possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present
into
<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>
This SO article, JS: Find URLs in Text, Make Links, is relevant to what I'm attempting to do but simply places the url as the text within the anchor element.
I am successfully matching the url with
var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");
but am unsure how to perform the match afterwards.
I came across this post Regex - Matching text AFTER certain characters which seems applicable. I've attempted to wrap my RE in /(?<=my url pattern here).+/
but get an error stating that there is an invalid group and that this results in an invalid RE.
In that post J-Law mentions that
Variable-length lookbehinds aren’t allowed
Is this what I'm attempting to do?
Since I'm already matching the url I feel like I could easily do some substring math to get the desired results.
I'm just using this as an attempt to learn more about regex.
Thanks
Upvotes: 3
Views: 2531
Reputation: 45135
Just add another capturing group to capture all the stuff at the end and make your inner groups non-capturing. Something like:
var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");
var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
var match = urlRE.exec(s);
alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);
// Returns:
// ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present",
// "http://www.somewebsite.com/some/path/somepage.html",
// " descriptive text which may or may not be present"]
I wrapped your entire regex in brackets ()
to form the first capturing group and inside that I made all your existing groups non-capturing with ?:
, You don't absolutely need to do that (making them non-capturing), but it does simplify the output. Then I just added one more group (.*)
to capture everything else until the end of the string $
.
After .exec
if you have a match, your match will be in [0]
, the url part will be in [1]
and the rest of your text in [2]
. This is why we used the non-capturing groups because otherwise you'd have a bunch of other captures that may or may not be useful.
Upvotes: 4