DataStax Enterprise on Docker: fails to start due to /hadoop/conf directory not being writable

I've followed DataStax's guide on best practices for using DSE with Docker, but I've run into the following bug using all of the default setup scripts and Dockerfiles provided by DataStax.

Error Log

Caused by: java.lang.RuntimeException: Failed to save custom DSE Hadoop config
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.writeDseHadoopConfig(CassandraJobConf.java:310) ~[dse-hadoop-5.0.3.jar:5.0.3]
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.writeDseHadoopConfig(CassandraJobConf.java:174) ~[dse-hadoop-5.0.3.jar:5.0.3]
        at com.datastax.bdp.ConfigurationWriterPlugin.onActivate(ConfigurationWriterPlugin.java:20) ~[dse-hadoop-5.0.3.jar:5.0.3]
        at com.datastax.bdp.plugin.PluginManager.initialize(PluginManager.java:377) ~[dse-core-5.0.3.jar:5.0.3]
        at com.datastax.bdp.plugin.PluginManager.activateDirect(PluginManager.java:306) ~[dse-core-5.0.3.jar:5.0.3]
        ... 7 common frames omitted
Caused by: java.io.IOException: Directory not writable: /opt/dse/resources/hadoop/conf
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.saveConfiguration(CassandraJobConf.java:466) ~[dse-hadoop-5.0.3.jar:5.0.3]
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.saveDseHadoopConfiguration(CassandraJobConf.java:345) ~[dse-hadoop-5.0.3.jar:5.0.3]
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.writeDseHadoopConfig(CassandraJobConf.java:300) ~[dse-hadoop-5.0.3.jar:5.0.3]
        ... 11 common frames omitted
Unable to start DSE server: Unable to activate plugin com.datastax.bdp.ConfigurationWriterPlugin
com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin com.datastax.bdp.ConfigurationWriterPlugin
        at com.datastax.bdp.plugin.PluginManager.activateDirect(PluginManager.java:327)
        at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:259)
        at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:169)
        at com.datastax.bdp.plugin.PluginManager.preStart(PluginManager.java:77)
        at com.datastax.bdp.server.DseDaemon.preStart(DseDaemon.java:490)
        at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:462)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583)
        at com.datastax.bdp.DseModule.main(DseModule.java:91)
Caused by: java.lang.RuntimeException: Failed to save custom DSE Hadoop config
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.writeDseHadoopConfig(CassandraJobConf.java:310)
        at com.datastax.bdp.hadoop.mapred.CassandraJobConf.writeDseHadoopConfig(CassandraJobConf.java:174)
        at com.datastax.bdp.ConfigurationWriterPlugin.onActivate(ConfigurationWriterPlugin.java:20)
        at com.datastax.bdp.plugin.PluginManager.initialize(PluginManager.java:377)
        at com.datastax.bdp.plugin.PluginManager.activateDirect(PluginManager.java:306)
        ... 7 more
Caused by: java.io.IOException: Directory not writable: /opt/dse/resources/hadoop/conf

Error is pretty straight forward, tried to address it by adding some additional chmod calls in the Dockerfile to no avail.

Dockerfile

# Provided without any warranty, these files are intended
# to accompany the whitepaper about DSE on Docker and are
# not intended for production and are not actively maintained.

# Loosely based on docker-cassandra by the fine folk at Spotify
# -- https://github.com/spotify/docker-cassandra/
# Loosely based on cassandra-docker by the one and only Al Tobey
# -- https://github.com/tobert/cassandra-docker/

# base yourself on any ubuntu 14.04 image containing JDK8
# official Docker Java images are distributed with OpenJDK
# Datastax certifies its product releases specifically
# on the Oracle/Sun JVM, so YMMV with OpenJDK

FROM nimmis/java:oracle-8-jdk

# Avoid ERROR: invoke-rc.d: policy-rc.d denied execution of start.
RUN echo "#!/bin/sh\nexit 0" > /usr/sbin/policy-rc.d

RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get update && \
    apt-get -y install adduser \
    curl \
    lsb-base \
    procps \
    zlib1g \
    gzip \
    python \
    python-support \
    sysstat \
    ntp bash tree && \
    rm -rf /var/lib/apt/lists/*

# grab gosu for easy step-down from root
RUN curl -o /bin/gosu -SkL "https://github.com/tianon/gosu/releases/download/1.4/gosu-$(dpkg --print-architecture)" \
    && chmod +x /bin/gosu


# DSE tarball can be download into the folder where Dockerfile is
# wget --user=$USER --password=$PASS http://downloads.datastax.com/enterprise/dse-5.0.0-bin.tar.gz
# you may want to replace dse-5.0.0-bin.tar.gz with the corresponding downloaded package name. When
# downloaded, please remove the version number part of the filename (or create a symlink), so the
# resulting file is named dse-bin.tar.gz (that way the docker file itself remains version independent).
#
# DataStax Agent debian package can be downloaded from
# wget --user=$USER --password=$PASS http://debian.datastax.com/enterprise/pool/datastax-agent_6.0.0_all.deb
# you may want to replace the specific version with the corresponding downloaded package name. When
# downloaded, please remove the version number part of the filename (or create a symlink), so the
# resulting file is named datastax-agent_all.deb (that way the docker file itself remains version
# independent).
ADD dse.tar.gz /opt
ADD datastax-agent_all.deb /tmp

ENV DSE_HOME /opt/dse

RUN ln -s /opt/dse* $DSE_HOME

# keep data here
VOLUME /data

# and logs here
VOLUME /logs

VOLUME /opt/dse

# create a dedicated user for running DSE node
RUN groupadd -g 1337 cassandra && \
    useradd -u 1337 -g cassandra -s /bin/bash -d $DSE_HOME cassandra && \
    chown -R cassandra:cassandra /opt/dse* 

RUN chmod r+w -R /opt/dse/

# install the agent
RUN dpkg -i /tmp/datastax-agent_all.deb

# starting node using custom entrypoint that configures paths, interfaces, etc.
COPY scripts/dse-entrypoint /usr/local/bin/
RUN chmod +x /usr/local/bin/dse-entrypoint
ENTRYPOINT ["/usr/local/bin/dse-entrypoint"]

# Running any other DSE/C* command should be done on behalf dse user
# Perform that using a generic command laucher
COPY scripts/dse-cmd-launcher /usr/local/bin/
RUN chmod +x /usr/local/bin/dse-cmd-launcher

# link dse commands to the launcher
RUN for cmd in cqlsh dsetool nodetool dse cassandra-stress; do \
        ln -sf /usr/local/bin/dse-cmd-launcher /usr/local/bin/$cmd ; \
    done

# the detailed list of ports
# http://docs.datastax.com/en/datastax_enterprise/5.0/datastax_enterprise/sec/secConfFirePort.html

# Cassandra
EXPOSE 7000 9042 9160

# Solr
EXPOSE 8983 8984

# Spark
EXPOSE 4040 7080 7081 7077

# Hadoop
EXPOSE 8012 50030 50060 9290

# Hive/Shark
EXPOSE 10000

# Graph

The last place where there might be an answer to fixing this issue might be the startup script used to actually launch DSE when this container starts.

DSE Startup Script (Called by Docker container on startup)

#!/bin/sh

# Provided without any warranty, these files are intended
# to accompany the whitepaper about DSE on Docker and are 
# not intended for production and are not actively maintained.

# Bind the various services
# These should be updated on every container start

if [ -z ${IP} ]; then
  IP=`hostname --ip-address`
fi

echo $IP > /data/ip.address

# create directories for holding the node's data, logs, etc.
create_dirs() {
  local base_dir=$1;

  mkdir -p $base_dir/data/commitlog
  mkdir -p $base_dir/data/saved_caches
  mkdir -p $base_dir/data/hints
  mkdir -p $base_dir/logs
}

# tweak the cassandra config
tweak_cassandra_config() {
  env="$1/cassandra-env.sh"
  conf="$1/cassandra.yaml"

  base_data_dir="/data"

  # Set the cluster name
  if [ -z "${CLUSTER_NAME}" ]; then
    printf " - No cluster name provided; skipping.\n"
  else
    printf " - Setting up the cluster name: ${CLUSTER_NAME}\n"
    regexp="s/Test Cluster/${CLUSTER_NAME}/g"
    sed -i -- "$regexp" $conf
  fi

  # Set the commitlog directory, and various other directories
  # These are done only once since the regexep matches will fail on subsequent
  # runs.
  printf " - Setting up directories\n"
  regexp="s|/var/lib/cassandra/|$base_data_dir/|g"
  sed -i -- "$regexp" $conf
  regexp="s/^listen_address:.*/listen_address: ${IP}/g"
  sed -i -- "$regexp" $conf
  regexp="s/rpc_address:.*/rpc_address: ${IP}/g"
  sed -i -- "$regexp" $conf

  # seeds
  if [ -z "${SEEDS}" ]; then
    printf " - Using own IP address ${IP} as seed.\n";
    regexp="s/seeds:.*/seeds: \"${IP}\"/g";
  else
    printf " - Using seeds: $SEEDS\n";
    regexp="s/seeds:.*/seeds: \"${IP},${SEEDS}\"/g"
  fi
  sed -i -- "$regexp" $conf

  # JMX
  echo "JVM_OPTS=\"\$JVM_OPTS -Djava.rmi.server.hostname=127.0.0.1\"" >> $env
}

tweak_dse_in_sh() {
  # point C* logs dir to the created volume
  sed -i -- "s|/var/log/cassandra|/logs|g" "$1/dse.in.sh"
}

tweak_spark_config() {
  sed -i -- "s|/var/lib/spark/|/data/spark/|g" "$1/spark-env.sh"
  sed -i -- "s|/var/log/spark/|/logs/spark/|g" "$1/spark-env.sh"
  mkdir -p /data/spark/worker
  mkdir -p /data/spark/rdd
  mkdir -p /logs/spark/worker
}

tweak_agent_config() {
  [ -d "/var/lib/datastax-agent" ] && cat > /var/lib/datastax-agent/conf/address.yaml <<EOF
stomp_interface: ${STOMP_INTERFACE}
use_ssl: 0
local_interface: ${IP}
hosts: ["${IP}"]
cassandra_install_location: /opt/dse
cassandra_log_location: /logs
EOF
  chown cassandra:cassandra /var/lib/datastax-agent/conf/address.yaml
}

setup_node() {
  printf "* Setting up node...\n"
  printf " + Setting up node...\n"

  create_dirs
  tweak_cassandra_config "$DSE_HOME/resources/cassandra/conf"
  tweak_dse_in_sh "$DSE_HOME/bin"
  tweak_spark_config "$DSE_HOME/resources/spark/conf"
  tweak_agent_config
  chown -R cassandra:cassandra /data /logs /conf

  # mark that we tweaked configs
  touch "$DSE_HOME/tweaked_configs"

  printf "Done.\n"
}

# if marker file doesn't exist, setup node
[ ! -f "$DSE_HOME/tweaked_configs" ] && setup_node

[ -f "/etc/init.d/datastax-agent" ] && /etc/init.d/datastax-agent start

exec gosu cassandra "$DSE_HOME/bin/dse" cassandra -f "$@"

Docker Container Commandline Arguments

And here's the commandline arguments I'm using to launch a single DSE instance via Docker:

#!/bin/bash

# Used to start a single DSE node that has both Spark and Cassandra running on it
OPSC_CONTAINER=$1

if [ -z "$OPSC_CONTAINER" ]; then
  echo "usage: start_docker_cluster.sh OPSCContainerName"
  echo "  OPSCContainerName   mandatory name of the container running OpsCenter"
  exit 1
fi

[ -z "$CLUSTER_NAME" ] && CLUSTER_NAME="Test_Cluster"

STOMP_INTERFACE=`docker exec $OPSC_CONTAINER hostname -I`
docker run -p 7080:7080 -p 4040:4040 -p 7077:7077 -p 9042:9042 --link $OPSC_CONTAINER -d -e CLUSTER_NAME="$CLUSTER_NAME" -e STOMP_INTERFACE="$STOMP_INTERFACE" --name dse dse -k -t

The -k -t flags indicate that we're going to be launching both Hadoop and Spark for this container. I've dropped the -t flag and still had this configuraiton error occur even without it.

What do I need to do to make the /opt/dse/resources/hadoop/conf directory writable so DSE can successfully boot?

Upvotes: 2

Answers (3)

sirolf2009

Reputation: 868

Doing this:

I added chown -RHh cassandra:cassandra /opt/dse in the setup_node() portion of DSE Startup Script (Called by Docker container on startup)

as answered by Max worked for me, but instead of his issue I got

Unable to activate plugin com.datastax.bdp.plugin.DseFsPlugin
(...)
java.io.IOException: Failed to create work directory: /var/lib/dsefs

So I had to turn my setup_node() to this

setup_node() {
  printf "* Setting up node...\n"
  printf " + Setting up node...\n"

  create_dirs
  tweak_cassandra_config "$DSE_HOME/resources/cassandra/conf"
  tweak_dse_in_sh "$DSE_HOME/bin"
  tweak_spark_config "$DSE_HOME/resources/spark/conf"
  tweak_agent_config
  chown -R cassandra:cassandra /data /logs /conf

  mkdir /var/lib/dsefs
  chown -RHh cassandra:cassandra /opt/dse /var/lib/dsefs

  # mark that we tweaked configs
  touch "$DSE_HOME/tweaked_configs"

  printf "Done.\n"
}

Upvotes: 0

afulay

Reputation: 66

Adding 'chown -RHh cassandra:cassandra /opt/dse' to the entrypoint script solved my problem of not being able to write to /opt/dse/resources/hadoop/conf.

Re. ERROR 04:15:04,789 SPARK-WORKER Logging.scala:74 - Failed to create work directory /var/lib/spark/worker

Check your spark-env.sh, and see your directory mappings. In my case, i have mounted two external volumes - /data and /logs. Both these directories are owned by cassandra:cassandra.

# This is a base directory for Spark Worker work files.
if [ "x$SPARK_WORKER_DIR" = "x" ]; then
    export SPARK_WORKER_DIR="/data/spark/worker"
fi

if [ "x$SPARK_LOCAL_DIRS" = "x" ]; then
    export SPARK_LOCAL_DIRS="/data/spark/rdd"
fi

# This is a base directory for Spark Worker logs.
if [ "x$SPARK_WORKER_LOG_DIR" = "x" ]; then
   export SPARK_WORKER_LOG_DIR="/logs/spark/worker"
fi

# This is a base directory for Spark Master logs.
if [ "x$SPARK_MASTER_LOG_DIR" = "x" ]; then
   export SPARK_MASTER_LOG_DIR="/logs/spark/master"
fi

This video shows fully functional DSE Enterprise running on docker: https://vimeo.com/181393134

Upvotes: 2

Max

Reputation: 458

I added chown -RHh cassandra:cassandra /opt/dse in the setup_node() portion of DSE Startup Script (Called by Docker container on startup) and it fixed the issue. Check out chown --help for more info on those options.

NOTE: I'm now getting a ERROR 04:15:04,789 SPARK-WORKER Logging.scala:74 - Failed to create work directory /var/lib/spark/worker later on, but at least my fix will get you past your initial issue.

setup_node() {
  printf "* Setting up node...\n"
  printf " + Setting up node...\n"

  create_dirs
  tweak_cassandra_config "$DSE_HOME/resources/cassandra/conf"
  tweak_dse_in_sh "$DSE_HOME/bin"
  tweak_spark_config "$DSE_HOME/resources/spark/conf"
  tweak_agent_config
  tweak_dse_config "$DSE_HOME/resources/dse/conf"
  chown -R cassandra:cassandra /data /logs /conf

  chown -RHh cassandra:cassandra /opt/dse

  # mark that we tweaked configs
  touch "$DSE_HOME/tweaked_configs"

  printf "Done.\n"
}

Upvotes: 1