Reputation: 10711
I have written a few programs over the last few months that load HTML pages into a string and does various things like extract bits and pieces. I was basically writing my own GUI for some websites which have no API.
I've done this by stringing together many String.Substring()
, String.IndexOf()
, and String.LastIndexOf()
statements.
I realise this is probably not the best way to do it - I was just writing a few "quick-and-dirty" trials to begin with.
What is the proper way to extract tokens from a web page? Thanks :)
Upvotes: 2
Views: 130
Reputation: 1064234
For XHTML, load it into XmlDocument or XDoxument.
For (non-X)HTML, load it into the HTML Agility Pack's HtmlDocument - the API is almost the same as XmlDocument, so it should be familiar.
Upvotes: 3