dzolnjan
dzolnjan

Reputation: 1263

Regex replace domain substring with html tag in C#

I'm trying to replace plain domain like substrings of a input string with 'a' tags, using regex like this:

var pattern = @"[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";

var input = "text1 www.example.com text2 <a href='foo'>www.example.com</a> text3";

var result = Regex.Replace(input, pattern, string.Format("<a href='$0'>$0</a>"));

This will create following output:

text1 <a href='www.example.com'>www.example.com</a> text2 <a href='foo'><a href='www.example.com'>www.example.com</a></a> text3

Which is wrong as second domain is already tag and it is now tag within tag.

Is there a way to modify regex pattern to ignore matching of second domain substring?

Perhaps by ignoring the '>' char at domain substring start? (or '<' char at the end)

Effectively generating this result:

text1 <a href='www.example.com'>www.example.com</a> text2 <a href='foo'>www.example.com</a> text3

Upvotes: 0

Views: 372

Answers (2)

Srb1313711
Srb1313711

Reputation: 2047

Try this:

 (?i)(?<!>)((w{3}\.)[^.]+\.[a-z]+(\.?[a-z])*)

This is assuming each domain begins with www. You can use your replace with this at will work unless the domain is preceded with a >. This may not be exactly what you are looking for but its somewhere to start, research negative look behinds as i believe this will help you.

Upvotes: 2

Thiago Vinicius
Thiago Vinicius

Reputation: 180

What you can also try is the following:

var pattern = @"(.*?)\s([\w*]+(\.{1}\w*)+)";

var result = Regex.Replace(input, pattern, string.Format("$1 <a href='$2'>$2</a>"), RegexOptions.None);

It would get all domains without the "www" as well.

Upvotes: 0

Related Questions