Reputation: 16
I want my program to automatically download only certain information off a website. After finding out that this is nearly impossible I figured it would be best if the program would just download the entire web page and then find the information that I needed inside of a string.
How can I find certain words/numbers after specific words? The word before the number I want to have is always the same. The number varies and that is the number I need in my program.
Upvotes: 0
Views: 215
Reputation: 4182
I've used HTML Agility Pack for multiple applications and it works well. Lots of options too.
It's a lovely HTML parser that is commonly recommended for this. It will take malformed HTML and massage it into XHTML and then a traversable DOM, like the XML classes. So, is very useful for the code you find in the wild.
Upvotes: 1
Reputation: 13402
Sounds like screen scraping. I recommend using CSQuery https://github.com/jamietre/CsQuery (or HtmlAgilityPack if you want). Get the source, parse as object, loop over all text nodes and do your string comparison there. The actual way of doing this varies a LOT on how the source HTML is done.
Maby something like this untested example written from memory (CSQuery)
var dom = CQ.Create(stringWithHtml);
dom["*"].Each((i, e) =>
{
// handle only text nodes
if (e.NodeType == NodeType.TEXT_NODE) {
// do your check here
}
}
Upvotes: 3