Reputation:
I am working on manipulating/extracting data from well-formed HTML in one of our legacy systems. I need to use regex to parse the HTML, find certain patterns, extract the data, and return some modified HTML. I know that regex and HTML are never the answer but, given that I know exactly where the data is coming from and that the data is properly structure, I am confident that this will work for the particular situation.
The HTML that I am working with has the following pattern:
<i>Name1</i>: Some text goes here<br/>
<i>Name2</i>: Some different text goes here<br/>
<i>Name3</i>: Some other different text goes here<br/>
I need to change the HTML to the following:
<i>Name1</i><p>Some text goes here</p>
<i>Name2</i><p>Some different text goes here</p>
<i>Name3</i><p>Some other different text goes here</p>
Basically, I want to take the inner text, wrap it in a p tag and then remove the trailing br.
I want to do something like the following:
Dim HTML as String = [The HTML goes here]
html = Regex.Replace(html, "</i>:(.+?)<br\s*\/?>", "</i><p>(.+?)</p>", RegexOptions.Multiline)
but it obviously isn't working.
In VB.net, how do I replace all desired instances of HTML with the new HTML?
Upvotes: 2
Views: 1172
Reputation: 499002
I suggest using the HTML Agility Pack to parse and manipulate HTML (in particular if the format of the HTML is not regular). The source download comes with a bunch of example projects, so you can see how to use it.
In general Regex is not a good solution for parsing HTML.
Upvotes: 2
Reputation: 49413
Give this a shot:
Dim HTML as String = [The HTML goes here]
Dim evaluator As MatchEvaluator = Function(m As Match)
Return "</i><p>" & m.Groups(1).Value & "</p>"
End Function
html = Regex.Replace(html, "</i>:(.+?)<br\s*\/?>", evaluator, RegexOptions.Multiline)
Upvotes: 1