user1630809
user1630809

Reputation: 532

C# and regular expressions: recursive replace until specific string

I have a recursive html text like:

string html = "<input id=\"txt0\" value=\"hello\"></input>some undefined text<input id=\"txt1\" value=\"world\"></input>";

that can be repeated n times (in the example n=2), but n is a variable number which is not known.

I would like to replace all text inside 'value' attribute (in the example 'hello' and 'world') with a text in an array, using regular expressions.

Regex rg = new Regex(which pattern?, RegexOptions.IgnoreCase);
int count= rg.Split(html).Length - 1; // in the example count = 2

for (int i = 0; i < count; i++)
{
     html= rg.Replace(html, @"value=""" + myarray[i] + @""">", 1);
}

My problem is that I cannot find the right regex pattern to make these substitutions.

If I use something like:

Regex rg = new Regex(@"value="".*""", RegexOptions.IgnoreCase);
int count= rg.Split(html).Length - 1;

for (int i = 0; i < count; i++)
{
     html= rg.Replace(html, @"value=""" + myarray[i] + @"""", 1);
}

I get html like

<input id="txt0" value="lorem ipsum"></input>

because .* in the pattern includes extra characters, while I need that it stops until the next

'<input'

occurence.

The result should be something like:

<input id="txt0" value="lorem ipsum"></input>some undefined text<input id="txt1" value="another text"></input>

A suggestion or an help would be very appreciated. Thanks!

Upvotes: 0

Views: 1762

Answers (2)

rikitikitik
rikitikitik

Reputation: 2450

While I'm inclined to nudge you towards using an HTML Parser, IF your HTML input is as simple as it is in your example and you have no funky HTMLs like the one L.B has in his answer, the solution to your problem is to just be NOT greedy:

    Regex rg = new Regex(@"value="".*""?", RegexOptions.IgnoreCase);

The question mark tells Regex to stop at the shortest possible match for your pattern.

Upvotes: 0

L.B
L.B

Reputation: 116108

Don't try to parse html with regex as others pointed out in comments.

Suppose you have an input with value <input id=txt2 value="x">.

<input id=txt1 value='<input id=txt2 value="x">' > would you easily be able to parse it?

Therefore use an Html Parser. I will use for your sample Html Agility Pack

string html = "<input id=\"txt0\" value=\"hello\"></input>some undefined text<input id=\"txt1\" value=\"world\"></input>";
var myarray = new List<string>() { "val111", "val222", "val333" };

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

int count = 0;
foreach(var inp in doc.DocumentNode.Descendants("input"))
{
    if (inp.Attributes["value"] != null) 
        inp.Attributes["value"].Value = myarray[count++]; 
}

Upvotes: 1

Related Questions