RandomWebGuy
RandomWebGuy

Reputation: 1439

Remove hyperlinks from text but keep anchor text

I need to strip link tags from a body of text but keep the anchor text. for example:

<a href ="">AnchorText</a>

needs to become just:

AnchorText

I was considering using the following RegEx:

<(.{0}|/)(a|A).*?>

Is a RegEx the best way to go about this? If so, is the above RegEx pattern adequate? If RegEx isn't the way to go, what's a better solution? This needs to be done server side.

Upvotes: 3

Views: 3090

Answers (5)

Lee
Lee

Reputation: 41

I have been trying to do the same and found the following solution:

  1. Export the text to CSV.
  2. Open the file in Excel.
  3. Run replace using <*> which will remove links and leave the anchor text.
  4. Import the result again to overwrite existing content.

Upvotes: 1

BrokenGlass
BrokenGlass

Reputation: 160852

You could just use HtmlAgilityPack:

string sampleHtml = "<a href =\"\">AnchorText</a>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sampleHtml);
string text = doc.DocumentNode.InnerText; //output: AnchorText

Upvotes: 3

stema
stema

Reputation: 92976

Your regex will do the job. You can write it a bit simpler as

</?(a|A).*?>

/? means 0 or 1 /

But its equivalent to your (.{0}|/)

Upvotes: 5

Tejs
Tejs

Reputation: 41236

Use jQuery replaceWith:

$('a').replaceWith(function()
{
    return $('<span/>').text($(this).text());
});

Assuming you are doing this on the client side.

Upvotes: 1

John Batdorf
John Batdorf

Reputation: 2542

I think a regex is the best way to accomplish this, and your pattern looks like it should work.

Upvotes: 1

Related Questions