Alexandr Sulimov
Alexandr Sulimov

Reputation: 1924

Regex replace urls with tag

In html need replace url to tag

Only http://google3.com:1139 and http://google6.com:1139

<div>
  <a href="http://google1.com:1139" target="_blank">http://google2.com:1139</a> 
  http://google3.com:1139
</div>
<div>
  <a href="http://google4.com:1139" target="_blank">http://google5.com:1139</a>
  http://google6.com:1139
</div>

Must be

<div>
  <a href="http://google1.com:1139" target="_blank">http://google2.com:1139</a> 
  <a href="http://google3.com:1139" target="_blank">http://google3.com:1139</a> 
</div>
<div>
  <a href="http://google4.com:1139" target="_blank">http://google5.com:1139</a>
  <a href="http://google6.com:1139" target="_blank">http://google6.com:1139</a> 
</div>

I found solution

        var result = Regex.Replace("<div><a href=\"http://google1.com:1139\" target=\"_blank\">http://google2.com:1139</a>http://google3.com:1139</div><div><a href=\"http://google4.com:1139\" target=\"_blank\">http://google5.com:1139</a>http://google6.com:1139</div>", 
                                   @"((?<!href=['""]?)(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
                                   "<a target='_blank' href='$1'>$1</a>");

But need replace

  1. started with http but now with href="http (found)

  2. not ended with

</a>
  1. or skip between
<a ... </a>

Upvotes: 0

Views: 553

Answers (1)

Dmitry Egorov
Dmitry Egorov

Reputation: 9650

I take it the requirements may be rephrased as follows:

  • everything between <a and </a> should be left intact (this includes href attribute values)
  • any URLs of the given pattern outside the <a and </a> should be wrapped in anchor tags.

This may be achieved by searching two patterns, <a.*?</a> and <some URL>, as alternatives. Then replace the match by itself if the first pattern is found and by a wrapped URL if the second pattern found:

Regex.Replace(html,
    @"<a.*?</a>|(?:https?|ftp)://[\w_.-]+:\d+",
    m => m.Value.StartsWith("<") 
        ? m.Value
        : string.Format("<a target='_blank' href='{0}'>{0}</a>", m.Value));

Demo: https://ideone.com/Jq1s8y

P.S.

I simplified the URL regex for the sake of conciseness. The real application may require more extended pattern.

Upvotes: 1

Related Questions