Justin808
Justin808

Reputation: 21522

c# Regex to modify all matching hrefs

How can I replace

<a href="page">Text</a>

with

<a href="page.html">Text</a>

where page and Text can be any set of characters?

Upvotes: 1

Views: 74

Answers (2)

Gebb
Gebb

Reputation: 6556

You shouldn't parse HTML with regular expressions. See the answer to this question for details.

UPD: As TrueWill has pointed out, you might want to do the replace with Html Agility Pack. But in some special cases the regexp proposed by FailedDev will do, although I would slightly modify it to look like this: @"(?<=<a\b[^>]*?\bhref\s*=\s*(['""]))(.*)(?=\1.*?>)" (put a \b after the <a to exclude other tags starting with "a").

Upvotes: 1

FailedDev
FailedDev

Reputation: 26940

This will work. Note that I only capture whatever is inside href.

resultString = Regex.Replace(subjectString, @"(?<=<a[^>]*?\bhref\s*=\s*(['""]))(.*)(?=\1.*?>)", "$2.html");

And append the .html to it. You may wish to change it to your needs.

Edit : before flame wars begin. Yes it will work for your specific example not for all possible html in the internet.

Upvotes: 1

Related Questions