dimazaid
dimazaid

Reputation: 1671

Best way to read and parse a text file in C#

I have a text file that contains an HTML code, and I want to take only specific tags and save them using C#!

I was thinking to do it with few Regex lines, is it the best and easiest way to do so?! or there's an easier function in C# that can do it?

Upvotes: 0

Views: 384

Answers (5)

Kiril
Kiril

Reputation: 40345

Using Regex is probably not the best way to do this, actually I would say that it's one of the numerous "bad" ideas which you could think of.

You might want to look into using the HTMLAgilityPack: it will parse the HTML, create a tree of nodes which you can navigate and you will be able to look at the tags which you're interested without doing any "crazy" regex. You'll save yourself a lot of trouble if you avoid regex, since HTML as it is found in the wild can be poor, nasty and brutish, though quite often far from short.

Upvotes: 3

Abe Miessler
Abe Miessler

Reputation: 85056

Using regex to parse HTML has been covered at length on SO. The consensus is that it should not be done. Give this post a read to understand why:

RegEx match open tags except XHTML self-contained tags

In the past I have used SGML reader to convert HTML to xml and then used xpath/xslt/linq-to-xml to parse it. This might work for you as well.

Upvotes: 1

Jeff Mercado
Jeff Mercado

Reputation: 134841

If the HTML is well formed, you could try reading it in using an XML parser and use the methods there. Fortunately there are tools immediately available in the framework to do this. Look into using LINQ to XML to make your job as simple as possible.

Otherwise if it is not well formed, you could use a third-party tool to parse it such as HTML Agility Pack.

Upvotes: 1

Len
Len

Reputation: 542

Regex can work but you have to very careful. HTML is not a "regular language," so there are free form exceptions that can throw things off. You also have to be careful with matching across linebreaks. It can be done though.

Look into: http://htmlagilitypack.codeplex.com/

Upvotes: 1

Royi Namir
Royi Namir

Reputation: 148524

2 options :

1) go with you own loop

2) use regex for much better matching and errors. ( youll ghet matched groups to your regex) and then you can iterate each one of item inside them

Upvotes: -1

Related Questions