user19930511
user19930511

Reputation: 389

Failed to connect to spark-master:7077

I am trying to deploy my spark application on Kubernetes. I followed the below steps:

Installed spark-kubernetes-operator:

helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install gcp-spark-operator spark-operator/spark-operator

Created a spark-app.py

from pyspark.sql.functions import *
from pyspark.sql import *
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType, IntegerType

if __name__ == "__main__":

    spark = SparkSession.builder.appName('spark-on-kubernetes-test').getOrCreate() 

    data2 = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","","40288","M",4000),
    ("Robert","","Williams","42114","M",4000),
    ("Maria","Anne","Jones","39192","F",4000),
    ("Jen","Mary","Brown","","F",-1)
  ]

    schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])
 
    df = spark.createDataFrame(data=data2,schema=schema)
    df.printSchema()
    df.show(truncate=False)
    print("program is completed !")

Then I created the new image with my application:

FROM bitnami/spark
USER root
WORKDIR /app
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY spark-app.py .<

Then I created the spark-application.yaml file:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-app
  namespace: default
spec:
  type: Python
  mode: cluster
  image: "test/spark-k8s-app:1.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///app/spark-app.py
  sparkVersion: 3.3.0
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    serviceAccount: spark
    labels:
      version: 3.3.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.3.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

But when I tried to deploy the yaml file, I am getting the below error:

$ kubectl logs pyspark-app-driver
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0m←[1mWelcome to the Bitnami spark container←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0mSubscribe to project updates by watching ←[1mhttps://github.com/bitnami/containers←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0mSubmit issues and feature requests at ←[1mhttps://github.com/bitnami/containers/issues←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0m

22/09/24 13:04:04 INFO SparkContext: Running Spark version 3.3.0
22/09/24 13:04:04 INFO ResourceUtils: ==============================================================       
22/09/24 13:04:04 INFO ResourceUtils: No custom resources configured for spark.driver.
22/09/24 13:04:04 INFO ResourceUtils: ==============================================================       
22/09/24 13:04:04 INFO SparkContext: Submitted application: ml_framework
22/09/24 13:04:04 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)    
22/09/24 13:04:04 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
22/09/24 13:04:04 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/09/24 13:04:04 INFO SecurityManager: Changing view acls to: root
22/09/24 13:04:04 INFO SecurityManager: Changing modify acls to: root
22/09/24 13:04:04 INFO SecurityManager: Changing view acls groups to:
22/09/24 13:04:04 INFO SecurityManager: Changing modify acls groups to:
22/09/24 13:04:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  
with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/24 13:04:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/09/24 13:04:05 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
22/09/24 13:04:05 INFO SparkEnv: Registering MapOutputTracker
22/09/24 13:04:05 INFO SparkEnv: Registering BlockManagerMaster
22/09/24 13:04:05 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/09/24 13:04:05 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/09/24 13:04:05 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/09/24 13:04:05 INFO DiskBlockManager: Created local directory at /var/data/spark-019ba05b-dba8-4350-a281-ffa35b54d840/blockmgr-20d5c478-8e93-42f6-85a9-9ed070f50b2b
22/09/24 13:04:05 INFO MemoryStore: MemoryStore started with capacity 117.0 MiB
22/09/24 13:04:05 INFO SparkEnv: Registering OutputCommitCoordinator
22/09/24 13:04:06 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/09/24 13:04:06 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/09/24 13:04:10 WARN TransportClientFactory: DNS resolution failed for spark-master:7077 took 4005 ms    
22/09/24 13:04:10 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-master:7077   
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
        at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:107)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Failed to connect to spark-master:7077
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
        ... 4 more
Caused by: java.net.UnknownHostException: spark-master
        at java.net.InetAddress.getAllByName0(InetAddress.java:1287)
        at java.net.InetAddress.getAllByName(InetAddress.java:1199)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
        at java.security.AccessController.doPrivileged(Native Method)
        at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
        at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)        
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)        
        at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
        at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
        at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)        
        at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)      
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)    
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 more
22/09/24 13:04:26 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/09/24 13:04:30 WARN TransportClientFactory: DNS resolution failed for spark-master:7077 took 4006 ms    
22/09/24 13:04:30 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-master:7077   
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
        at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:107)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Failed to connect to spark-master:7077
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
        ... 4 more
Caused by: java.net.UnknownHostException: spark-master
        at java.net.InetAddress.getAllByName0(InetAddress.java:1287)
        at java.net.InetAddress.getAllByName(InetAddress.java:1199)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
        at java.security.AccessController.doPrivileged(Native Method)
        at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
        at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)        
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)        
        at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
        at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
        at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)        
        at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)      
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)9)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)    
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 more

How can I resolve this?

Upvotes: 2

Views: 654

Answers (0)

Related Questions