Reputation: 1394
Theres a number of products out there that provide a gui to pick out the tags you want to scrape from a web page. (Things like WebHarvy for example)
I've seen the HTML Agility Pack before for getting at the DOM. I just wanted to check if anyone knows of any nice libraries or processes for automatically finding the useful content within a HTML page and creating the XPath required.
Similar to how Evernote and iOS know where the "Article" is on a page. However ideally working for repeating regions and pagination.
Upvotes: 0
Views: 901
Reputation: 12713
Not sure if this is what you are looking for:
http://www.diffbot.com/
But Diffbot is good in scraping content from websites.
Upvotes: 1