Reputation: 31
Recently,I had to crawl some website with open Source project crawler4j.However,crawler4j didn't offer any api for using.Now,i came to a problem that how i can parse a html with the function and class provided by crawler4j and find element like we do with jquery
Upvotes: 3
Views: 2503
Reputation: 155
It's relatively simple. The following approach worked for me.
In MyCrawler.java
:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
...
public void visit(Page page) {
...
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
Document doc = Jsoup.parseBodyFragment(html);
...
Upvotes: 8