khawarizmi
khawarizmi

Reputation: 732

Apache Accumulo Installation

I am trying to install Apache Accumulo 2.0 with Hadoop cluster and Zookeeper clusters already running as separate docker containers;

Now I must set the below environment variables as per installation instructions;

############################
# Variables that must be set
############################

## Hadoop installation
export HADOOP_HOME="${HADOOP_HOME:-/path/to/hadoop}"
## Hadoop configuration
export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_HOME}/etc/hadoop}"
## Zookeeper installation
export ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/path/to/zookeeper}"

However these directories do not exist in local machine. Do I have to copy these directories from the individual containers of Hadoop and Zookeeper in order to make them available in local machine where I am trying to run Accumulo? Or there is some proper way to configure it?

Upvotes: 0

Views: 465

Answers (2)

Christopher
Christopher

Reputation: 2512

There are two main purposes of the conf/accumulo-env.sh script in Accumulo 2.0:

  1. To set up any environment, such as CLASSPATH, and
  2. To set up any JVM arguments to pass to the java command.

In its most simplest form, bin/accumulo basically does:

  source conf/accumulo-env.sh
  java "${JAVA_OPTS[@]}" "$@"

So, any environment you export in accumulo-env.sh, such as CLASSPATH, will be set for the call to java. And, any options you set up in the JAVA_OPTS array will be passed along to java.

The accumulo-env.sh contents are expected to be customized by the user. The default contents of script try to set up the CLASSPATH environment for Accumulo processes by using your current installations of Hadoop and ZooKeeper. However, you must tell it where these are located on your system in order for this to work. That is the purpose of these 'must be set' variables. Accumulo requires the client libraries from Hadoop and ZooKeeper, as well as the Hadoop configuration files, to be on the Accumulo CLASSPATH. If these are not present locally, you will need to figure out how to get them on your CLASSPATH for use by Accumulo and update this environment script to let Accumulo know where you put them.

If you wish to set up your CLASSPATH differently, you are free to do so, but you will probably want to customize a larger portion of the accumulo-env.sh script. This is a likely scenario for advanced users who have customized their deployment, or for vendors who have customized their vendor-provided build of Accumulo.

Upvotes: 1

OneCricketeer
OneCricketeer

Reputation: 191681

Accumulo requires the Hadoop XML config files , as does any Hadoop client. It fines these using $HADOOP_CONF_DIR or $HADOOP_HOME/conf

It uses $HADOOP_HOME/lib to get Hadoop JARs

I'm not sure what is used by having $ZOOKEEPER_HOME, but I guess Accumulo does not come with Zookeeper JARs either

Accumulo will use these locations to find Hadoop and Zookeeper jars and add them to your CLASSPATH variable

So, yes, you need to copy them out of the container or download Hadoop libraries on the host and volume mount them into the container instead. You should already have volume mounts for Zookeeper and the NameNode & Datanodes anyway.

Upvotes: 1

Related Questions