Reputation: 39
I'm looking for a good Java api to do web scraping. I tried WEB-Harvest api http://web-harvest.sourceforge.net/usage.php but I think it's a bit clunky. Any other suggestions?
Upvotes: 3
Views: 3078
Reputation: 1832
I use this: https://github.com/subes/invesdwin-webproxy
It supports HttpClient and HtmlUnit (headless browser that supports javascript) and parallelizes it if required over a large pool of proxies. I can also recommend JSoup for static html processing.
Upvotes: 0
Reputation: 1946
http://hc.apache.org/httpcomponents-client-ga/
(Maven Dependency)
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
Upvotes: 0