Parbat
Parbat

Reputation: 9

Troubleshooting Apache Spark Connect Server with Docker Compose

Here's a Docker Compose setup for a distributed Apache Spark environment using Bitnami's Spark image. It includes:

All services are connected via a custom network and use a shared volume for data.

Once I submit a task to the Spark Connect server on port 15002 from my local machine, the Spark master distributes the workload to the workers. After some time, I can see the output in the PyCharm console. However, the application continues to run on the master, and the workers keep processing it.

To resolve this, I need to manually kill the application. If I try to run a new application after this, I encounter a gRPC error with a status code of 2, indicating an unknown error.

Docker-compose file:

version: '3.8'
services:
  spark-master:
    image: bitnami/spark
    container_name: spark-master
    environment:
      - SPARK_MODE=master
      - SPARK_MASTER_WEBUI_PORT=8080
      - SPARK_MASTER_PORT=7077
      - SPARK_SUBMIT_OPTIONS=--packages io.delta:delta-spark_2.12:3.2.0
      - SPARK_MASTER_HOST=spark-master
    ports:
      - 8080:8080
      - 7077:7077
    networks:
      - spark-network
    volumes:
      - /mnt/f/Thesis_Docs/Project/spark:/mnt

  spark-connect:
    image: bitnami/spark
    container_name: spark-connect
    environment:
      - SPARK_MODE=driver
      - SPARK_MASTER=spark://spark-master:7077
    ports:
      - 15002:15002
    networks:
      - spark-network
    depends_on:
      - spark-master
    command: ["/bin/bash", "-c", "/opt/bitnami/spark/sbin/start-connect-server.sh --master spark://spark-master:7077 --packages org.apache.spark:spark-connect_2.12:3.5.1"]
    volumes:
      - /mnt/f/Thesis_Docs/Project/spark:/mnt

  spark-worker:
    image: bitnami/spark
    container_name: spark-worker
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER=spark://spark-master:7077
      - SPARK_WORKER_CORES=2
      - SPARK_WORKER_MEMORY=2G
      - SPARK_WORKER_WEBUI_PORT=8081
    ports:
      - 8081:8081
    depends_on:
      - spark-master
    networks:
      - spark-network

  spark-worker2:
    image: bitnami/spark
    container_name: spark-worker2
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER=spark://spark-master:7077
      - SPARK_WORKER_CORES=2
      - SPARK_WORKER_MEMORY=2G
      - SPARK_WORKER_WEBUI_PORT=8082
    ports:
      - 8082:8082
    depends_on:
      - spark-master
    networks:
      - spark-network
networks:
  spark-network:

Python Code Running on Pycharm on Host Machine:

from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()

# Perform your Spark operations
spark.range(15).show()
# Stop Spark session
spark.stop()

Master Web UI on Port 8080: Master Web UI on Port 8080

PyCharm Output Windows: PyCharm Output Windows

Errors for GRPC with status code 2. Once I kill the application which is running on the cluster through WebUI. I can't run any other application on same port.

pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = ""
    debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-07-11T09:52:19.743877+00:00", grpc_status:2, grpc_message:""}"

The expected behavior is for the Spark master to coordinate the distribution of tasks to the Spark workers, and the Spark Connect server to allow for the submission of tasks from your local machine.

However, there are a few issues and points to address to ensure the system functions correctly and prevents the gRPC error with status code 2.

Also once the application is completed in pycharm and it shows the result in the pycharm console it should show the task in list of completed application on the WebUI too

Then I should be able to submit new application on the same port using any IDE.

Upvotes: 0

Views: 247

Answers (0)

Related Questions