Reputation: 75
i want to implement a java crawler based on Hadoop Framework using MapReduce Architecture and insert content in HBase . i try to combine this 2 tutorials :
But i can't understand the concept. Where to put the logic for extracting the links from the page ? What is the input data type of the Mapper ? Thanks in advance
Upvotes: 1
Views: 1744
Reputation: 4864
Just use Apache Nutch - it is based on Hadoop and has everything you should need and more.
Upvotes: 1