Aissam Jadli
Aissam Jadli

Reputation: 75

Hadoop MapReduce Based Web Java Crawler

i want to implement a java crawler based on Hadoop Framework using MapReduce Architecture and insert content in HBase . i try to combine this 2 tutorials :

Basic web crawler example

MapReduce tutorial

But i can't understand the concept. Where to put the logic for extracting the links from the page ? What is the input data type of the Mapper ? Thanks in advance

Upvotes: 1

Views: 1744

Answers (1)

Julien Nioche
Julien Nioche

Reputation: 4864

Just use Apache Nutch - it is based on Hadoop and has everything you should need and more.

Upvotes: 1

Related Questions