Mostafa Ghadimi
Mostafa Ghadimi

Reputation: 6786

Using Docker and connectors to connect kafka to spark, spark to cassandra and kafka to cassandra

Detail: We have dockerized Kafka, Cassandra and Spark, using wurstmeister/kafka, strapdata/elassandra and bde2020/spark-master images in docker-compose.

What we want to do is to connect the following using connectors:

The problem is that we don't know whether it works fine or not, because these technologies sounds new for us.

Graphical Representation:

enter image description here

Important Files:

docker-compose.yml

version: '2'
services:
  spark:
    container_name: spark
    image: bde2020/spark-master
    ports: 
      - 9180:8080
      - 9177:7077
      - 9181:8081
    links: 
      - elassandra
    volumes:
hosein:/var/lib/docker/volumes/data/python
      - /home/mostafa/Desktop/kafka-test/together/cassandra/mostafa-hosein:/var/lib/docker/volumes/data/python



  elassandra:
    image: strapdata/elassandra
    container_name: elassandra
    build: /home/mostafa/Desktop/kafka-test/together/cassandra
    env_file:
      - /home/mostafa/Desktop/kafka-test/together/cassandra/conf/cassandra.env
    volumes:
      - /home/mostafa/Desktop/kafka-test/together/cassandra/jarfile:/var/lib/docker/volumes/data/_data
    ports:
      - '7000:7000'
      - '7001:7001'
      - '7199:7199'
      - '9042:9042'
      - '9142:9142'
      - '9160:9160'
      - '9200:9200'
      - '9300:9300'

  zookeeper:
    image: wurstmeister/zookeeper
    container_name: zookeeper
    ports:
      - "2181:2181"

  kafka:
    build: .
    container_name: kafka
    links:
     - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_HOST_NAME: localhost
      KAFKA_ADVERTISED_PORT: 9092
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_OPTS: -javaagent:/usr/app/jmx_prometheus_javaagent.jar=7071:/usr/app/prom-jmx-agent-config.yml
      CONNECTORS: elassandra
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on: 
      - elassandra

  kafka_connect-cassandra:
    image: datamountaineer/kafka-connect-cassandra
    container_name: kafka-connect-cassandra
    ports:
      - 8083:8083
      - 9102:9102
    environment: 
      - connect.cassandra.contact.points=localhost
      - KAFKA_ZOOKEEPER_CONNECT =  "zookeeper:2181"
      - KAFKA_ADVERTISED_LISTENERS= "kafka:9092"
      - connect.cassandra.port=9042
      - connector.class=com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector
      - tasks.max=1
    depends_on:
      - kafka
      - elassandra

Dockerfile

FROM wurstmeister/kafka
ADD prom-jmx-agent-config.yml /usr/app/prom-jmx-agent-config.yml
ADD jmx_prometheus_javaagent-0.10.jar /usr/app/jmx_prometheus_javaagent.jar
COPY wait-for-it.sh /wait-for-it.sh
RUN chmod +x /wait-for-it.sh
CMD ["/wait-for-it.sh", "zookeeper:2181", "--", "start-kafka.sh"]

Example: As an example I have added CONNECTOR: elassandra to environment variables of kafka's container but I haven't faced with any error and not sure whether it is a valid environment variable or not!

How do we can validate environment variables and test the connectors working fine?

Upvotes: 1

Views: 1897

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 192043

As mentioned, CONNECTORS is not a valid variable for the Kafka container. Kafka Connect is a separate service from the broker, so needs to be a separate container.

Kafka Connect exposes a REST API at port 8083.

You need to perform HTTP requests using curl, Postman, etc. to provide Connectors; they cannot be loaded just from variables.

I am not immediately aware of any specific properties needed for the Datamountainer containers, but they are built on top of the Confluent images, and you can find all those environment variables here - https://github.com/confluentinc/cp-docker-images/blob/5.1.2-post/examples/cp-all-in-one/docker-compose.yml#L64-L86

These are for Kafka container, not Kafka Connect since they start with KAFKA_

  - KAFKA_ZOOKEEPER_CONNECT =  "zookeeper:2181"
  - KAFKA_ADVERTISED_LISTENERS= "kafka:9092

And these are for the connector properties (which would be POSTed via JSON), not Environment variables.

  - connect.cassandra.contact.points=localhost
  - connect.cassandra.port=9042
  - connector.class=com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector
  - tasks.max=1

Then, localhost shouldn't be used anywhere in these properties; if you want Connect container to reach Cassandra, you would use "connect.cassandra.contact.points": "elassandra" (the docker service name)

Upvotes: 1

Related Questions