Reputation: 1483
What are sites for Hadoop Best practice , Not the Books where I can get the step by step process to create new projects and small examples . I am not able to find a single site like this , please share.
Upvotes: 4
Views: 1939
Reputation: 20576
Hadoop is not something one single application instead it is a distributed processing framework which is used by several applications which sits top of this framework. Pig, Hive, HBase, Cassandra, etc are few of many such application designed for specific requirement. Underneath all of these application consume Hadoop framework which mainly consist of distributed file system (HDFS) and distributed processing (MapReduce).
Technically when you have a bare minimum Hadoop cluster (HDFS + MapReduce only) you can start writing MapReduce based applications (in Java or other languages are supported through Hadoop Streaming) to process some data.
What you could do is first download a pre-build/configured Hadoop virtual Image from Cloudera or Hortonworks distribution and get it running in your machine. After that start learning writing MapReduce jobs in Java and run in your virtual machine.
Here is the URL to download Cloudera Hadoop Distribution VM
Here is the link to learn writing simplest wordcount job.
Upvotes: 0
Reputation: 12020
There is an awesome article from yahoo developers on Apache Hadoop: Best Practices and Anti-Patterns
Upvotes: 1