Reputation: 636
I have a docker image for spark named spark-docker
and the cassandra official docker image cassandra
. I want to run a spark-submit job from spark-docker
which will write data in cassandra.
The Dockerfile
for spark-docker
is as follows:
FROM bde2020/spark-python-template:2.4.0-hadoop2.7
MAINTAINER Rafiul
RUN pip install --upgrade pip
RUN pip install pyspark cassandra-driver
I am using the following command to do that.
docker run -ti --network=dockers_default spark-dcoker:latest /spark/bin/spark-submit --conf spark.cassandra.connection.host=cassandra --packages datastax:spark-cassandra-connector:2.4.0-s_2.11 /app/data_extractor.py -f /app/dataset.tar
This will extract data from dataset.tar
and will save the data in cassandra.
But I am getting the following error
cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
In my python code I have done this:
from cassandra.cluster import Cluster
class CassandraSchemaGenerator:
def __init__(self, keyspace):
self.keyspace = keyspace
self.cluster = Cluster()
self.cluster_conn = self.cluster.connect()
How can I get the IP address and port number on which cassandra is running and put it in my python code so that it can connect to cassandra?
Upvotes: 0
Views: 406
Reputation: 8646
You cant use 127.0.0.1
to connect from one container to another unless you are not using network=host
.
So, you do one of the following:
Switch to network=host
mode when starting containers (this mode requires no port exposion)
Or (better) join both containers in your network and use container names as host names to connect between them:
docker network create foo
docker run --network=foo -d --name=cassy cassandra
docker run --network=foo -ti --name=spark spark-docker:latest ...
Note name
argument - this allows containers have human readable names.
Now from spark
you can connect to cassandra using host name cassy
instead of ip
Upvotes: 1