Reputation: 1073
I have a html like this :
<h1> Headhing </h>
<font name="arial">some text</font></br>
some other text
In C#, I want to get the out put as below. Simply content inside the font start tag and end tag
<font name="arial">some text</font>
Upvotes: 0
Views: 1336
Reputation: 72652
I wouldn't recommend to try it with regex.
I use the HTML Agility Pack to parse HTML and get what I want. It's a lovely HTML parser that is commonly recommended for this. It will take malformed HTML and massage it into XHTML and then a traversable DOM, like the XML classes. So, is very useful for the code you find in the wild.
There's also an HTML parser from Microsoft MSHTML but I haven't tried it.
Upvotes: 4
Reputation: 88044
First off, your html is wrong. you should close a <h1>
with a </h1>
not </h>
. This one thing is why reg ex is inappropriate to parse tags.
Second, there are hundreds of questions on SO talking about parsing html with regex. The answer is don't. Use something like the html agility pack.
Upvotes: 4
Reputation: 6050
Regex regExfont = new Regex(@"<font name=""arial""[^>]*>.*</font>");
MatchCollection rows = regExfont.Matches(string);
good website is http://www.regexlib.com/RETester.aspx
Upvotes: 1