Reputation: 840
I am new to cloudera, I installed cloudera in my system successfully I have two doubts,
Consider a machine with some nodes already using hadoop with some data, Can we install Cloudera to use the existing Hadoop without made any changes or modifaction on data stored existing hadooop.
I installed Cloudera in my machine, I have another three machines to add those as clusters, I want to know, Am i want install cloudera in those three machines before add those machines as clusters ?, or Can we add a node as clusters without installing cloudera on that purticular nodes?.
Thanks in advance can anyone, please give some information about the above questions.
Upvotes: 1
Views: 381
Reputation: 181
Answer to your second question, you can add directly, with installation few pre requisites like openssh-clients and firewalls and java.
these machines( existing node, new three nodes) should accept same username and password (or) you should set passwordless ssh to these hosts..
you should connect to the internet while adding the nodes.
I hope it will help you:)
Upvotes: 0
Reputation: 5063
Answer to questions -
1. If you want to migrate to CDH from existing Apache Distribution, you can follow this link
Excerpt:
Overview
The migration process does require a moderate understanding of Linux system administration. You should make a plan before you start. You will be restarting some critical services such as the name node and job tracker, so some downtime is necessary. Given the value of the data on your cluster, you’ll also want to be careful to take recent back ups of any mission-critical data sets as well as the name node meta-data.
Backing up your data is most important if you’re upgrading from a version of Hadoop based on an Apache Software Foundation release earlier than 0.20.
2.CDH binary needs be installed and configured in all the nodes to have a CDH based cluster up and running.
Upvotes: 2
Reputation: 5239
From the Cloudera Manual
You can migrate the data from a CDH3 (or any Apache Hadoop) cluster to a CDH4 cluster by using a tool that copies out data in parallel, such as the DistCp tool offered in CDH4.
Regarding your second question,
Again from the manual page
Important: Before proceeding, you need to decide:
As a general rule: The NameNode and JobTracker run on the the same "master" host unless the cluster is large (more than a few tens of nodes), and the master host (or hosts) should not run the Secondary NameNode (if used), DataNode or TaskTracker services. In a large cluster, it is especially important that the Secondary NameNode (if used) runs on a separate machine from the NameNode. Each node in the cluster except the master host(s) should run the DataNode and TaskTracker services.
Additionally, if you use Cloudera Manager it will automatically do all the setup necessary i.e install the necessary selected components on the nodes in the cluster.
Off-topic: I had a bad habit of not referrring the manual properly. Have a clear look at it, it answers all our questions
Upvotes: 1