okello
okello

Reputation: 609

How to set Zookeeper dataDir in Docker (fig.yml)

I've configured Zookeeper and Kafka containers in a fig.yml file for Docker. Both containers start fine. But after sending a number of messages, my application /zk-client hangs. On checking zookeeper logs, I see the error:

Error Path:/brokers Error:KeeperErrorCode = NoNode for /brokers

My fig.yml is as follows:

zookeeper:
  image: wurstmeister/zookeeper
  ports:
    - "2181:2181"
  environment:
    ZK_ADVERTISED_HOST_NAME: xx.xx.x.xxx
    ZK_CONNECTION_TIMEOUT_MS: 6000
    ZK_SYNC_TIME_MS: 2000
    ZK_DATADIR: /path/to/data/zk/data/dir
kafka:
  image: wurstmeister/kafka:0.8.2.0
  ports:
    - "xx.xx.x.xxx:9092:9092"
  links:
    - zookeeper:zk
  environment:
    KAFKA_ADVERTISED_HOST_NAME: xx.xx.x.xxx
    KAFKA_LOG_DIRS: /home/svc_cis4/dl
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock

I've searched for quite a while now, but I haven't got a solution yet. I've also tried setting the data directory in fig.yml using ZK_DATADIR: '/path/to/zk/data/dir' but it doesn't seem to help. Any assistance will be appreciated.

UPDATE

Content of /opt/kafka_2.10-0.8.2.0/config/server.properties:

broker.id=0
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.cleaner.enable=false
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000

Upvotes: 0

Views: 2654

Answers (2)

okello
okello

Reputation: 609

The configuration that's been working for me without any issues for the last two days involves specifying host addresses for both Zookeeper and Kafka. My fig.yml content is:

zookeeper:
  image: wurstmeister/zookeeper
  ports:
    - "xx.xx.x.xxx:2181:2181"
kafka:
  image: wurstmeister/kafka:0.8.2.0
  ports:
    - "9092:9092"
  links:
    - zookeeper:zk
  environment:
     KAFKA_ADVERTISED_HOST_NAME: xx.xx.x.xxx
     KAFKA_NUM_REPLICA_FETCHERS: 4
     ...other env variables...
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
validator:
  build: .
  volumes:
    - .:/host
  entrypoint: /bin/bash
  command: -c 'java -jar /host/app1.jar'
  links:
    - zookeeper:zk
    - kafka
analytics:
  build: .
  volumes:
    - .:/host
  entrypoint: /bin/bash
  command: -c 'java -jar /host/app2.jar'
  links:
    - zookeeper:zk
    - kafka
loader:
  build: .
  volumes:
    - .:/host
  entrypoint: /bin/bash
  command: -c 'java -jar /host/app3.jar'
  links:
    - zookeeper:zk
    - kafka

And the accompanying Dockerfile content:

FROM ubuntu:trusty

MAINTAINER Wurstmeister

RUN apt-get update; apt-get install -y unzip openjdk-7-jdk wget git docker.io

RUN wget -q http://apache.mirrors.lucidnetworks.net/kafka/0.8.2.0/kafka_2.10-0.8.2.0.tgz -O /tmp/kafka_2.10-0.8.2.0.tgz
RUN tar xfz /tmp/kafka_2.10-0.8.2.0.tgz -C /opt

VOLUME ["/kafka"]

ENV KAFKA_HOME /opt/kafka_2.10-0.8.2.0
ADD start-kafka.sh /usr/bin/start-kafka.sh
ADD broker-list.sh /usr/bin/broker-list.sh
CMD start-kafka.sh

Upvotes: 0

Javier Cortejoso
Javier Cortejoso

Reputation: 9146

The problems you are having are not related with zookeeper's data directory. The error Error Path:/brokers Error:KeeperErrorCode = NoNode for /brokers are due to your application cannot find any broker znode in zookeeper's data. This is happening probably because the kafka container is not connecting correctly with zookeeper, and looking to wurstmeister's images I think the problem may be related to variable KAFKA_ADVERTISED_HOST_NAME could be wrong. I don't know if there is a reason to assign that variable through a env variable that has to be passed, but from my point of view this is not a good approach. There are multiple ways to configure kafka (in fact there is no need to set advertised.host.name and you can leave it commented and kafka will take default hostname, which can be set with docker), but a fast solution using this would be editing start-kafka.sh and rebuilding the image:

#!/bin/bash

if [[ -z "$KAFKA_ADVERTISED_PORT" ]]; then
    export KAFKA_ADVERTISED_PORT=$(docker port `hostname` 9092 | sed -r "s/.*:(.*)/\1/g")
fi
if [[ -z "$KAFKA_BROKER_ID" ]]; then
    export KAFKA_BROKER_ID=$KAFKA_ADVERTISED_PORT
fi
if [[ -z "$KAFKA_LOG_DIRS" ]]; then
    export KAFKA_LOG_DIRS="/kafka/kafka-logs-$KAFKA_BROKER_ID"
fi
if [[ -z "$KAFKA_ZOOKEEPER_CONNECT" ]]; then
    export KAFKA_ZOOKEEPER_CONNECT=$(env | grep ZK.*PORT_2181_TCP= | sed -e 's|.*tcp://||' | paste -sd ,)
fi

if [[ -n "$KAFKA_HEAP_OPTS" ]]; then
    sed -r -i "s/^(export KAFKA_HEAP_OPTS)=\"(.*)\"/\1=\"$KAFKA_HEAP_OPTS\"/g" $KAFKA_HOME/bin/kafka-server-start.sh
    unset KAFKA_HEAP_OPTS
fi

for VAR in `env`
do
  if [[ $VAR =~ ^KAFKA_ && ! $VAR =~ ^KAFKA_HOME ]]; then
    kafka_name=`echo "$VAR" | sed -r "s/KAFKA_(.*)=.*/\1/g" | tr '[:upper:]' '[:lower:]' | tr _ .`
    env_var=`echo "$VAR" | sed -r "s/(.*)=.*/\1/g"`
    if egrep -q "(^|^#)$kafka_name=" $KAFKA_HOME/config/server.properties; then
        sed -r -i "s@(^|^#)($kafka_name)=(.*)@\2=${!env_var}@g" $KAFKA_HOME/config/server.properties #note that no config values may contain an '@' char
    else
        echo "$kafka_name=${!env_var}" >> $KAFKA_HOME/config/server.properties
    fi
  fi
done

###NEW###
IP=$(hostname --ip-address)
sed -i -e "s/^advertised.host.name.*/advertised.host.name=$IP/" $KAFKA_HOME/config/server.properties
###END###

$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties

If this doesn't solve your problem you can get more information starting a session inside the containers (i.e.: docker exec -it kafkadocker_kafka_1 /bin/bash for kafka's and docker exec -it kafkadocker_zookeeper_1 /bin/bash for zookeeper's), and there check kafka logs, or zookeeper console (/opt/zookeeper-3.4.6/bin/zkCli.sh)

Upvotes: 2

Related Questions