N Klosterman
N Klosterman

Reputation: 1251

Javascript Regex: match text after pattern

I have text of a form where there are paragraphs of text with urls interspersed. I would like to parse the string creating html links from the urls and using the following text as the descriptive link text i.e.

possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present

into

<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>

This SO article, JS: Find URLs in Text, Make Links, is relevant to what I'm attempting to do but simply places the url as the text within the anchor element.

I am successfully matching the url with

var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");

but am unsure how to perform the match afterwards.

I came across this post Regex - Matching text AFTER certain characters which seems applicable. I've attempted to wrap my RE in /(?<=my url pattern here).+/ but get an error stating that there is an invalid group and that this results in an invalid RE.

In that post J-Law mentions that

Variable-length lookbehinds aren’t allowed

Is this what I'm attempting to do?

Since I'm already matching the url I feel like I could easily do some substring math to get the desired results.

I'm just using this as an attempt to learn more about regex.

Thanks

Upvotes: 3

Views: 2531

Answers (1)

Matt Burland
Matt Burland

Reputation: 45135

Just add another capturing group to capture all the stuff at the end and make your inner groups non-capturing. Something like:

    var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");

    var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
    
    var match = urlRE.exec(s);
    alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);

    // Returns: 
    // ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present", 
    // "http://www.somewebsite.com/some/path/somepage.html", 
    // " descriptive text which may or may not be present"]

I wrapped your entire regex in brackets () to form the first capturing group and inside that I made all your existing groups non-capturing with ?:, You don't absolutely need to do that (making them non-capturing), but it does simplify the output. Then I just added one more group (.*) to capture everything else until the end of the string $.

After .exec if you have a match, your match will be in [0], the url part will be in [1] and the rest of your text in [2]. This is why we used the non-capturing groups because otherwise you'd have a bunch of other captures that may or may not be useful.

Upvotes: 4

Related Questions