Rafiul Sabbir
Rafiul Sabbir

Reputation: 636

Communication between two docker images

I have a docker image for spark named spark-docker and the cassandra official docker image cassandra. I want to run a spark-submit job from spark-dockerwhich will write data in cassandra.

The Dockerfile for spark-docker is as follows:

FROM bde2020/spark-python-template:2.4.0-hadoop2.7

MAINTAINER Rafiul

RUN pip install --upgrade pip
RUN pip install pyspark cassandra-driver

I am using the following command to do that.

docker run -ti --network=dockers_default spark-dcoker:latest /spark/bin/spark-submit --conf spark.cassandra.connection.host=cassandra  --packages datastax:spark-cassandra-connector:2.4.0-s_2.11 /app/data_extractor.py -f /app/dataset.tar

This will extract data from dataset.tar and will save the data in cassandra.

But I am getting the following error

cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

In my python code I have done this:

from cassandra.cluster import Cluster


class CassandraSchemaGenerator:
    def __init__(self, keyspace):
        self.keyspace = keyspace
        self.cluster = Cluster()
        self.cluster_conn = self.cluster.connect()

How can I get the IP address and port number on which cassandra is running and put it in my python code so that it can connect to cassandra?

Upvotes: 0

Views: 406

Answers (1)

grapes
grapes

Reputation: 8646

You cant use 127.0.0.1 to connect from one container to another unless you are not using network=host.

So, you do one of the following:

Switch to network=host mode when starting containers (this mode requires no port exposion)

Or (better) join both containers in your network and use container names as host names to connect between them:

docker network create foo
docker run --network=foo -d  --name=cassy cassandra
docker run --network=foo -ti --name=spark spark-docker:latest ...

Note name argument - this allows containers have human readable names. Now from spark you can connect to cassandra using host name cassy instead of ip

Upvotes: 1

Related Questions