Reputation: 5717

Linq parse html string

I want to parse an html page and get a specific value from it. How can I do this using Linq or string parsing in C# ?

------------- MORE HTML ----------

     <span class="date">
        04.09.2012
    </span>
    <table cellspacing="0"><tr><th scope="row">1 EUR</th><td><span>**4,4907**</span></td><td><span class="rise">+0,0009</span></td><td><span class="rise">+0,02%</span></td></tr><tr><th scope="row">1 USD</th><td><span>3,5635</span></td><td><span class="fall">-0,0093</span></td><td><span class="fall">-0,26%</span></td></tr></table>

------------- MORE HTML ----------

I am interested in getting the value 4,4907 in bold!

Any idea how to achieve this?

Thanks!

Upvotes: 1

Answers (3)

mortb

Reputation: 9869

Be careful when trying to parse HTML.

I think the obvious way would be to load it into an XDocument (as XML) but as HTML is often ambiguous or contains syntax errors this is bound to fail.

People here on Stack overflow have instead suggested to use http://htmlagilitypack.codeplex.com/ which is said to do a great job parsing html. Then you may use xpath to query your document for various contents.

Upvotes: 1