Reputation: 5717
I want to parse an html page and get a specific value from it. How can I do this using Linq or string parsing in C# ?
------------- MORE HTML ----------
<span class="date">
04.09.2012
</span>
<table cellspacing="0"><tr><th scope="row">1 EUR</th><td><span>**4,4907**</span></td><td><span class="rise">+0,0009</span></td><td><span class="rise">+0,02%</span></td></tr><tr><th scope="row">1 USD</th><td><span>3,5635</span></td><td><span class="fall">-0,0093</span></td><td><span class="fall">-0,26%</span></td></tr></table>
------------- MORE HTML ----------
I am interested in getting the value 4,4907 in bold!
Any idea how to achieve this?
Thanks!
Upvotes: 1
Views: 2613
Reputation: 9849
Be careful when trying to parse HTML.
I think the obvious way would be to load it into an XDocument (as XML) but as HTML is often ambiguous or contains syntax errors this is bound to fail.
People here on Stack overflow have instead suggested to use http://htmlagilitypack.codeplex.com/ which is said to do a great job parsing html. Then you may use xpath to query your document for various contents.
Upvotes: 1
Reputation: 6087
You can try a regular expression in C# this way:
http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx
To find the string between "< span > * " and " * < / span >".
Or you can use an HTML parser like "jericho" and navigate through HTML tags to reach your value.
Upvotes: 0
Reputation: 168913
If you only need that bit, use a regular expression. (But don't use a regular expression to parse more complex HTML.)
<td><span>4,4907</span></td>
would be matched most conveniently by the regular expression
<td><span>([0-9,]+)</span></td>
And see for example this quickly Googled page on how to use regexps with C#.
Upvotes: 4