Ravi Kumar
Ravi Kumar

Reputation: 1402

Javascript Regex to urlify text

In string containing a lot of following url token -

[http://www.someurl.com/path/to/resource/?some=params&crazy_chars=true_0_1_0_1]

Which I want to capture and convert to

<a href="http://www.someurl.com/path/to/resource/?some=params&crazy_chars=true_0_1_0_1" target="_blank" class="exturl">http://www.someurl.com/path/to/resource/?some=params&crazy_chars=true_0_1_0_1</a>

So all the urls inside square bracket would be search and replaced by inline url to element. Current I found Regex for URL pattern as -

RegExp("\[(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&amp;:/~+#-]*[\w@?^=%&amp;/~+#-])?\]", "gi");

But I am still not clear on how I can do it in single pass. Do I have to loop for till no matcher is found?

Upvotes: 2

Views: 3238

Answers (4)

greatwolf
greatwolf

Reputation: 20838

I would write a helper function that takes a single url string as input and return the anchor tag with that url on match. Parse the big string into an array with each element matching a corresponding [] pair. Then it's just a matter of iterating over this array and passing it into the helper function:

function urlify(s)
{
  var urlpat = /\[((https?|ftp):\/\/\w+[^\]]*)\]/i;

  var matches = urlpat.exec(s);
  var anchor_url = '<a href="%1">%1</a>';
  return matches ? anchor_url.replace(/%1/g, matches[1]) : '';
}

instring = '[http://www.someurl.com/path/to/resource/?some=params&crazy_chars=true_0_1_0_1]' +
           '[@ID 65421]' +
           '[http://google.com]';

var arr = instring.match( /(\[[^\]]+\])/g );
for(var each in arr)
{
  arr[each] = urlify(arr[each]);
}

arr will contain:

[ '<a href="http://www.someurl.com/path/to/resource/some=params&crazy_chars=true_0_1_0_1">http://www.someurl.com/path/to/resource/?some=params&crazy_chars=true_0_1_0_1</a>',
  '',
  '<a href="http://google.com">http://google.com</a>' ]

Upvotes: 0

Bergi
Bergi

Reputation: 664599

Current I found Regex for URL pattern

But it was intended to be a regex literal, not a string argument to the RegExp constructor. All your backslashes do string-escape the following chars and have no effect in the regex. Instead, use

/\[(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&amp;:\/~+#-]*[\w@?^=%&amp;\/~+#-])?\]/gi

But I am still not clear on how I can do it in single pass. Do I have to loop for till no matcher is found?

No, a simple replace call will suffice. You can put a capturing group around the url (between the square brackets) and then use the captures in the replacement string:

var regex = /\[((?:ftp|http)s?:\/\/[\w-]+(?:\.[\w-]+)+(?:[\w.,@?^=%&amp;:\/~+#-]*[\w@?^=%&amp;\/~+#-])?)\]/gi;
// here:       ^                                                                                       ^
// (the non-capturing groups are optional)
urlified = text.replace(regex, '<a href="$1" class="exturl">$1</a>');
// here:                                 ^^                 ^^

For more advanced replacement rules you might use the callback function parameter of replace.

And of course you might (should) employ the regex improvements/simplifications the other answers suggested.

Upvotes: 2

Benjamin Toueg
Benjamin Toueg

Reputation: 10867

Let's suppose that:

  • no recursive [[]]
  • no empty []
  • the url never contains bracket, at sign nor sharp
  • nothing else than [url], [@ID342892904], [#sometag] contains bracket

Then this simple regex will do the trick:

\[[^@#]+\]
  • \[ matches an opening bracket (symbol needs to be escaped)
  • [^@#]+ matches any character except @ and #, repeated 1 or more times
  • \] matches a closing bracket (symbol needs to be escaped)

Upvotes: 0

Ondra Žižka
Ondra Žižka

Reputation: 46806

JavaScript's regex is moreless same as Java's.

The JTexy project (something like MarkDown, but better) has a lot of regexes for various tasks, including URL matching.

 #(?<=^|[\\s(\\[<:\\x17])(?:https?://|www\\.|ftp://)[0-9.$TEXY_CHAR-][/\\d$TEXY_CHAR+\\.~%&?@=_:;\\#,\\xAD-]+[/\\d$TEXY_CHAR+~%?@=_\\#]#u

$TEXY_CHAR is defined somewhere in the project.

By the way, using brackets to enclose URL isn't really a good idea, for example PHP uses [...] for initializing hashes, often used for checkboxes.

Upvotes: 0

Related Questions