How to manage configurations to connect to multiple Hadoop clusters?

Question

What are the best practices for managing the client configurations to multiple Hadoop clusters? By "client" I mean a machine that is not part of the cluster but is used by someone to submit jobs to it.

I can think of two possibilities: different virtual machines that are each configured for one cluster, or just extract and configure the tools in different directories on the same machine. But I'm not sure if one is clearly better than the other, or if there are other alternatives.

This seems like something that would be a general problem for many people working with Hadoop, but I will include my specific situation as an example. I have access to a large Hadoop cluster and a smaller testing/experimental Hadoop cluster. They have slightly different versions of some of the Hadoop tools since the testing cluster has a tool (Shark) that required a different version of another tool (Hive) that is installed on the main cluster.

hba · Accepted Answer

Cloudera installation installs configs in the alternatives.

$ alternatives --display hadoop-conf
hadoop-conf - status is auto.
 link currently points to /etc/hadoop/conf.pseudo.mr1
/etc/hadoop/conf.empty - priority 10
/etc/hadoop/conf.pseudo.mr1 - priority 30
Current `best' version is /etc/hadoop/conf.pseudo.mr1.

You maybe able to employ the same technique to switch between multiple configurations.

Here is a pretty good how-to.

How to manage configurations to connect to multiple Hadoop clusters?

Answers (1)

Related Questions