platypusq
platypusq

Reputation: 16

How do I scrape a website for information?

I want my program to automatically download only certain information off a website. After finding out that this is nearly impossible I figured it would be best if the program would just download the entire web page and then find the information that I needed inside of a string.

How can I find certain words/numbers after specific words? The word before the number I want to have is always the same. The number varies and that is the number I need in my program.

Upvotes: 0

Views: 215

Answers (2)

jordanhill123
jordanhill123

Reputation: 4182

I've used HTML Agility Pack for multiple applications and it works well. Lots of options too.

It's a lovely HTML parser that is commonly recommended for this. It will take malformed HTML and massage it into XHTML and then a traversable DOM, like the XML classes. So, is very useful for the code you find in the wild.

Upvotes: 1

Joel Peltonen
Joel Peltonen

Reputation: 13402

Sounds like screen scraping. I recommend using CSQuery https://github.com/jamietre/CsQuery (or HtmlAgilityPack if you want). Get the source, parse as object, loop over all text nodes and do your string comparison there. The actual way of doing this varies a LOT on how the source HTML is done.

Maby something like this untested example written from memory (CSQuery)

var dom = CQ.Create(stringWithHtml);
dom["*"].Each((i, e) =>
{
    // handle only text nodes
    if (e.NodeType == NodeType.TEXT_NODE) {
        // do your check here
    }
}

Upvotes: 3

Related Questions