Hadoop MapReduce Based Web Java Crawler

Question

i want to implement a java crawler based on Hadoop Framework using MapReduce Architecture and insert content in HBase . i try to combine this 2 tutorials :

Basic web crawler example

MapReduce tutorial

But i can't understand the concept. Where to put the logic for extracting the links from the page ? What is the input data type of the Mapper ? Thanks in advance

Julien Nioche · Accepted Answer

Just use Apache Nutch - it is based on Hadoop and has everything you should need and more.

Hadoop MapReduce Based Web Java Crawler

Answers (1)

Related Questions