Reputation: 880
Here is my regular expression
Dim TableHeaderExpression As String = "<th[^>]*>(.*?)</th>"
and here is my HTML
<th class="seller-col">
<b>Relevanz</b>
<span class="ps-sprite ps-sprite-sortdw" title=""></span>
</th>
this expression gives me everything inside the th Tag so it outputs
<b>Relevanz</b>
<span class="ps-sprite ps-sprite-sortdw" title=""></span>
but how i make it output only
Relevanz
meaning ignore all the text inside <th>
except for whats inside <b>
Upvotes: 0
Views: 47
Reputation: 499212
Instead of using Regex for parsing HTML (not the best option), use the HTML Agility Pack to parse and query the HTML.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Upvotes: 1