finfinni
finfinni

Reputation: 39

Java API for web scraping or web mining

I'm looking for a good Java api to do web scraping. I tried WEB-Harvest api http://web-harvest.sourceforge.net/usage.php but I think it's a bit clunky. Any other suggestions?

Upvotes: 3

Views: 3078

Answers (3)

subes
subes

Reputation: 1832

I use this: https://github.com/subes/invesdwin-webproxy

It supports HttpClient and HtmlUnit (headless browser that supports javascript) and parallelizes it if required over a large pool of proxies. I can also recommend JSoup for static html processing.

Upvotes: 0

BZ.
BZ.

Reputation: 1946

http://hc.apache.org/httpcomponents-client-ga/

(Maven Dependency)

<dependency>
  <groupId>commons-httpclient</groupId> 
  <artifactId>commons-httpclient</artifactId> 
  <version>3.1</version> 
</dependency>

Upvotes: 0

Speck
Speck

Reputation: 2309

I've used httpunit to do just this task in production.

Upvotes: 0

Related Questions