VB.net: Extract and replace all instances of HTML

Question

I am working on manipulating/extracting data from well-formed HTML in one of our legacy systems. I need to use regex to parse the HTML, find certain patterns, extract the data, and return some modified HTML. I know that regex and HTML are never the answer but, given that I know exactly where the data is coming from and that the data is properly structure, I am confident that this will work for the particular situation.

The HTML that I am working with has the following pattern:

Name1: Some text goes here

Name2: Some different text goes here

Name3: Some other different text goes here

I need to change the HTML to the following:

Name1Some text goes here
Name2Some different text goes here
Name3Some other different text goes here

Basically, I want to take the inner text, wrap it in a p tag and then remove the trailing br.

I want to do something like the following:

Dim HTML as String = [The HTML goes here]
html = Regex.Replace(html, ":(.+?)", "(.+?)", RegexOptions.Multiline)

but it obviously isn't working.

In VB.net, how do I replace all desired instances of HTML with the new HTML?

NakedBrunch · Accepted Answer

Give this a shot:

Dim HTML as String = [The HTML goes here]
Dim evaluator As MatchEvaluator = Function(m As Match)
                                  Return "" & m.Groups(1).Value & ""
                                  End Function
html = Regex.Replace(html, ":(.+?)", evaluator, RegexOptions.Multiline)

VB.net: Extract and replace all instances of HTML

Answers (2)

Related Questions