lordy1981
lordy1981

Reputation: 188

C# replace multiple href values

I have a block of html that looks something like this;

<p><a href="docs/123.pdf">33</a></p>

There are basically hundreds of anchor links which I need to replace the href based on the anchor text. For example, I need to replace the link above with something like;

<a href="33.html">33</a>. 

I will need to take the value 33 and do a lookup on my database to find the new link to replace the href with.

I need to keep it all in the original html as above!

How can I do this? Help!

Upvotes: 2

Views: 3392

Answers (5)

Joel Beckham
Joel Beckham

Reputation: 18664

Although this doesn't answer your question, the HTML Agility Pack is a great tool for manipulating and working with HTML: http://html-agility-pack.net

It could at least make grabbing the values you need and doing the replaces a little easier.

Contains links to using the HTML Agility Pack: How to use HTML Agility pack

Upvotes: 5

Alan Moore
Alan Moore

Reputation: 75272

So, what you want to do is generate the replacement string based on the contents of the match. Consider using one of the Regex.Replace overloads that take a MatchEvaluator. Example:

static void Main()
{
  Regex r = new Regex(@"<a href=""[^""]+"">([^<]+)");

  string s0 = @"<p><a href=""docs/123.pdf"">33</a></p>";
  string s1 = r.Replace(s0, m => GetNewLink(m));

  Console.WriteLine(s1);
}

static string GetNewLink(Match m)
{
  return string.Format(@"(<a href=""{0}.html"">{0}", m.Groups[1]);
}

I've actually taken it a step further and used a lambda expression instead of explicitly creating a delegate method.

Upvotes: 0

Nicholas Carey
Nicholas Carey

Reputation: 74385

Slurp your HTML into an XmlDocument (your markup is valid, isn't it?) Then use XPath to find all the <a> tags with an href attribute. Apply the transform and assign the new value to the href attribute. Then write the XmlDocument out.

Easy!

Upvotes: 1

agent-j
agent-j

Reputation: 27953

Consider using the the following rough algorithm.

using System;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

static class Program
{
  static void Main ()
  {
    string html = "<p><a href=\"docs/123.pdf\">33</a></p>"; // read the whole html file into this string.
    StringBuilder newHtml = new StringBuilder (html);
    Regex r = new Regex (@"\<a href=\""([^\""]+)\"">([^<]+)"); // 1st capture for the replacement and 2nd for the find
    foreach (var match in r.Matches(html).Cast<Match>().OrderByDescending(m => m.Index))
    {
       string text = match.Groups[2].Value;
       string newHref = DBTranslate (text);
       newHtml.Remove (match.Groups[1].Index, match.Groups[1].Length);
       newHtml.Insert (match.Groups[1].Index, newHref);
    }

    Console.WriteLine (newHtml);
  }

  static string DBTranslate(string s)
  {
    return "junk_" + s;
  }
}

(The OrderByDescending makes sure the indexes don't change as you modify the StringBuilder.)

Upvotes: 1

Daniel
Daniel

Reputation: 31609

Use a regexp to find the values and replace A regexp like "/<p><a herf=\"[^\"]+\">([^<]+)<\\/a><\\/p> to match and capture the ancor text

Upvotes: 0

Related Questions