Percy
Percy

Reputation: 3115

Regex to create html anchor tag

I have a text field for user comments, a user may or may not insert a URL into this field.

e.g. they could have any of the following (plus other variations):

What I want to do is match on these and change the string to include an HTML anchor tag.

Using the various other Stack Overflow answers about this subject I have come up with the below:

text = text.Trim();
text = Regex.Replace(text,
    @"((https?|ftp):\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})",
    "<a target='_blank' href='$1'>$1</a>");

This works almost perfectly, it matches all the required patterns BUT when it matched against www.google.com (without the http(s)://) part, the anchor tag created isn't correct, the href of the anchor needs the http:// part or it creates the link as a relative url to the site.

How can I change the code above so that if the match doesn't contain the http:// part, it will add it to the href part of the anchor?

Interestingly, as I'm typing this question, the preview part is creating links out of my URLs above - all except my "trouble" one - the one without the http/ftp:// prefix.

Upvotes: 3

Views: 1115

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

Use a match evaluator to check if Group 2 ((https?|ftp)) matched. If it did not, use one logic, else, use another.

var text = "Look at http://google.com some more text here possibly,\nLook at www.google.com some more text here possibly";
text = text.Trim();
text = Regex.Replace(text,
    @"((https?|ftp)://(?:www\.|(?!www))[^\s.]+\.\S{2,}|www\.\S+\.\S{2,})", m => 
    m.Groups[2].Success ? 
       string.Format("<a target='_blank' href='{0}'>{0}</a>", m.Groups[1].Value) :
       string.Format("<a target='_blank' href='http://{0}'>{0}</a>", m.Groups[1].Value));
Console.WriteLine(text);

See the C# demo, output:

Look at <a target='_blank' href='http://google.com'>http://google.com</a> some more text here possibly, 
Look at <a target='_blank' href='http://www.google.com'>www.google.com</a> some more text here possibly

Note I replaced [^\s] with \S everywhere in the pattern to make it look "prettier".

You may also remove the outer capturing group (and use @"(https?|ftp)://(?:www\.|(?!www))[^\s.]+\.\S{2,}|www\.\S+\.\S{2,}" pattern) and then check if m.Groups[1].Success is true and use m.Value in the replacements.

Upvotes: 3

Related Questions