Reputation: 12892
I have the following to detect and replace links:
// need to find anchors
Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\#\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);
MatchCollection matches = urlRx.Matches(source);
foreach (Match match in matches)
{
source = source.Replace(match.Value, "<a target=\"_blank\" href='" + match.Value + "'>" + match.Value + "</a>");
}
however when source
contains an anchor, this doesn't quite work because it replaces the innards of the already-existing anchor with another anchor. How can I prevent this from happening?
Sample i/o:
http://www.google.com -> <a target="blank"> href="http://www.google.com">http://www.google.com</a>
Pre-existing anchors (<a></a>) -> unchanged
I think preventing matching any url preceded by a non-whitespace character (or quote "
) would be valid, but I don't know how to do that.
Upvotes: 0
Views: 109
Reputation: 6974
All you need is to check if there is already a pre-existing anchor
Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\#\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);
MatchCollection matches = urlRx.Matches(source);
var rxAnchor = new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")", RegexOptions.IgnoreCase);
foreach (Match match in matches)
{
List<string> urls = rxAnchor.Matches(source).OfType<Match>().Select(m => m.Groups["href"].Value).ToList();
if (urls != null && urls.Count() > 0)
{
string urlToAppend = urls[0];
// DO Your Stuff here
}
else
{
source = source.Replace(match.Value, "<a target=\"_blank\" href='" + match.Value + "'>" + match.Value + "</a>");
}
}
Upvotes: 1