Reputation: 1263
I'm trying to replace plain domain like substrings of a input string with 'a' tags, using regex like this:
var pattern = @"[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";
var input = "text1 www.example.com text2 <a href='foo'>www.example.com</a> text3";
var result = Regex.Replace(input, pattern, string.Format("<a href='$0'>$0</a>"));
This will create following output:
text1 <a href='www.example.com'>www.example.com</a> text2 <a href='foo'><a href='www.example.com'>www.example.com</a></a> text3
Which is wrong as second domain is already tag and it is now tag within tag.
Is there a way to modify regex pattern to ignore matching of second domain substring?
Perhaps by ignoring the '>' char at domain substring start? (or '<' char at the end)
Effectively generating this result:
text1 <a href='www.example.com'>www.example.com</a> text2 <a href='foo'>www.example.com</a> text3
Upvotes: 0
Views: 372
Reputation: 2047
Try this:
(?i)(?<!>)((w{3}\.)[^.]+\.[a-z]+(\.?[a-z])*)
This is assuming each domain begins with www. You can use your replace with this at will work unless the domain is preceded with a >
. This may not be exactly what you are looking for but its somewhere to start, research negative look behinds as i believe this will help you.
Upvotes: 2
Reputation: 180
What you can also try is the following:
var pattern = @"(.*?)\s([\w*]+(\.{1}\w*)+)";
var result = Regex.Replace(input, pattern, string.Format("$1 <a href='$2'>$2</a>"), RegexOptions.None);
It would get all domains without the "www" as well.
Upvotes: 0