dps123
dps123

Reputation: 1073

Regex to get the tags

I have a html like this :

<h1> Headhing </h>
<font name="arial">some text</font></br>
some other text

In C#, I want to get the out put as below. Simply content inside the font start tag and end tag

<font name="arial">some text</font>

Upvotes: 0

Views: 1336

Answers (3)

PeeHaa
PeeHaa

Reputation: 72652

I wouldn't recommend to try it with regex.

I use the HTML Agility Pack to parse HTML and get what I want. It's a lovely HTML parser that is commonly recommended for this. It will take malformed HTML and massage it into XHTML and then a traversable DOM, like the XML classes. So, is very useful for the code you find in the wild.

There's also an HTML parser from Microsoft MSHTML but I haven't tried it.

Upvotes: 4

ChrisLively
ChrisLively

Reputation: 88044

First off, your html is wrong. you should close a <h1> with a </h1> not </h>. This one thing is why reg ex is inappropriate to parse tags.

Second, there are hundreds of questions on SO talking about parsing html with regex. The answer is don't. Use something like the html agility pack.

Upvotes: 4

Mike
Mike

Reputation: 6050

 Regex regExfont = new Regex(@"<font name=""arial""[^>]*>.*</font>");
 MatchCollection rows = regExfont.Matches(string);

good website is http://www.regexlib.com/RETester.aspx

Upvotes: 1

Related Questions