Reputation: 143
It is a pattern recognition task in web crawler. The traditional crawler gets the data of the whole page. If there is any way to make the crawler a litter intelligence, like just to identify and capture the the information part.
Upvotes: 0
Views: 60
Reputation: 4749
It is a research problem called wrapper induction or web data extraction. I don't know any library for this, but there are a lot of research papers (see below the list of good ones IMHO) and some research projects like DIADEM (their site contains list of publications as well).
Upvotes: 1