Reputation: 1671
I have a text file that contains an HTML code, and I want to take only specific tags and save them using C#!
I was thinking to do it with few Regex lines, is it the best and easiest way to do so?! or there's an easier function in C# that can do it?
Upvotes: 0
Views: 384
Reputation: 40345
Using Regex is probably not the best way to do this, actually I would say that it's one of the numerous "bad" ideas which you could think of.
You might want to look into using the HTMLAgilityPack: it will parse the HTML, create a tree of nodes which you can navigate and you will be able to look at the tags which you're interested without doing any "crazy" regex. You'll save yourself a lot of trouble if you avoid regex, since HTML as it is found in the wild can be poor, nasty and brutish, though quite often far from short.
Upvotes: 3
Reputation: 85056
Using regex to parse HTML has been covered at length on SO. The consensus is that it should not be done. Give this post a read to understand why:
RegEx match open tags except XHTML self-contained tags
In the past I have used SGML reader to convert HTML to xml and then used xpath/xslt/linq-to-xml to parse it. This might work for you as well.
Upvotes: 1
Reputation: 134841
If the HTML is well formed, you could try reading it in using an XML parser and use the methods there. Fortunately there are tools immediately available in the framework to do this. Look into using LINQ to XML to make your job as simple as possible.
Otherwise if it is not well formed, you could use a third-party tool to parse it such as HTML Agility Pack.
Upvotes: 1
Reputation: 542
Regex can work but you have to very careful. HTML is not a "regular language," so there are free form exceptions that can throw things off. You also have to be careful with matching across linebreaks. It can be done though.
Look into: http://htmlagilitypack.codeplex.com/
Upvotes: 1
Reputation: 148524
2 options :
1) go with you own loop
2) use regex for much better matching and errors. ( youll ghet matched groups to your regex) and then you can iterate each one of item inside them
Upvotes: -1