Jean-Eric Dby
Jean-Eric Dby

Reputation: 51

Kubernetes 1.9 can't initialize SparkContext

Trying to catch up with the Spark 2.3 documentation on how to deploy jobs on a Kubernetes 1.9.3 cluster : http://spark.apache.org/docs/latest/running-on-kubernetes.html

The Kubernetes 1.9.3 cluster is operating properly on offline bare-metal servers and was installed with kubeadm. The following command was used to submit the job (SparkPi example job):

/opt/spark/bin/spark-submit --master k8s://https://k8s-master:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=spark:v2.3.0 local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar 

Here is the stacktrace that we all love:

++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=driver
+ '[' -z driver ']'
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_JAVA_OPTS
+ '[' -n /opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
+ '[' -n '' ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -Dspark.kubernetes.driver.pod.name=spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver -Dspark.driver.port=7078 -Dspark.submit.deployMode=cluster -Dspark.master=k8s://https://k8s-master:6443 -Dspark.kubernetes.executor.podNamePrefix=spark-pi-b6f8a60df70a3b9d869c4e305518f43a -Dspark.driver.blockManager.port=7079 -Dspark.app.id=spark-7077ad8f86114551b0ae04ae63a74d5a -Dspark.driver.host=spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver-svc.default.svc -Dspark.app.name=spark-pi -Dspark.kubernetes.container.image=spark:v2.3.0 -Dspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar,/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar -Dspark.executor.instances=2 -cp ':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' -Xms1g -Xmx1g -Dspark.driver.bindAddress=10.244.1.17 org.apache.spark.examples.SparkPi
2018-03-07 12:39:35 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-03-07 12:39:36 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-03-07 12:39:36 INFO  SparkContext:54 - Submitted application: Spark Pi
2018-03-07 12:39:36 INFO  SecurityManager:54 - Changing view acls to: root
2018-03-07 12:39:36 INFO  SecurityManager:54 - Changing modify acls to: root
2018-03-07 12:39:36 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-03-07 12:39:36 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-03-07 12:39:36 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2018-03-07 12:39:36 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 7078.
2018-03-07 12:39:36 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-03-07 12:39:36 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-03-07 12:39:36 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-03-07 12:39:36 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-03-07 12:39:36 INFO  DiskBlockManager:54 - Created local directory at /tmp/blockmgr-7f5370ad-b495-4943-ad75-285b7ead3e5b
2018-03-07 12:39:36 INFO  MemoryStore:54 - MemoryStore started with capacity 408.9 MB
2018-03-07 12:39:36 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-03-07 12:39:36 INFO  log:192 - Logging initialized @1936ms
2018-03-07 12:39:36 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-03-07 12:39:36 INFO  Server:414 - Started @2019ms
2018-03-07 12:39:36 INFO  AbstractConnector:278 - Started ServerConnector@4215838f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-03-07 12:39:36 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5b6813df{/jobs,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@495083a0{/jobs/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5fd62371{/jobs/job,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2b62442c{/jobs/job/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@66629f63{/stages,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@841e575{/stages/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27a5328c{/stages/stage,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b5966e1{/stages/stage/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@65e61854{/stages/pool,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1568159{/stages/pool/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4fcee388{/storage,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6f80fafe{/storage/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3af17be2{/storage/rdd,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@f9879ac{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@37f21974{/environment,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5f4d427e{/environment/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e521c1e{/executors,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@224b4d61{/executors/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d5d9e5{/executors/threadDump,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@303e3593{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4ef27d66{/static,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62dae245{/,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b6579e8{/api,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3954d008{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2f94c4db{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-03-07 12:39:36 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver-svc.default.svc:4040
2018-03-07 12:39:36 INFO  SparkContext:54 - Added JAR /opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar at spark://spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.3.0.jar with timestamp 1520426376949
2018-03-07 12:39:37 WARN  KubernetesClusterManager:66 - The executor's init-container config map is not specified. Executors will therefore not attempt to fetch remote or submitted dependencies.
2018-03-07 12:39:37 WARN  KubernetesClusterManager:66 - The executor's init-container config map key is not specified. Executors will therefore not attempt to fetch remote or submitted dependencies.
2018-03-07 12:39:42 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
    at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
    ... 8 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
    ... 12 more
2018-03-07 12:39:42 INFO  AbstractConnector:318 - Stopped Spark@4215838f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-03-07 12:39:42 INFO  SparkUI:54 - Stopped Spark web UI at http://spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver-svc.default.svc:4040
2018-03-07 12:39:42 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-03-07 12:39:42 INFO  MemoryStore:54 - MemoryStore cleared
2018-03-07 12:39:42 INFO  BlockManager:54 - BlockManager stopped
2018-03-07 12:39:42 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-03-07 12:39:42 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
2018-03-07 12:39:42 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-03-07 12:39:42 INFO  SparkContext:54 - Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
    at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [spark-pi-b6f8a60df70a3b9d869c4e305518f43a-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
    ... 8 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
    ... 12 more
2018-03-07 12:39:42 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-03-07 12:39:42 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-64fe7ad8-669f-4591-a3f6-67440d450a44

So apparently the Kubernetes Scheduler Backend cannot contact the pod because it is unable to resolve kubernetes.default.svc. Hum.. why?

I also configured RBAC with a spark service account as mentionned in the documentation but the same problem occurs. (also tried on a different namespace, same problem)

Here are the logs from kube-dns:

I0306 16:04:04.170889       1 dns.go:555] Could not find endpoints for service "spark-pi-b9e8b4c66fe83c4d94a8d46abc2ee8f5-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
I0306 16:04:29.751201       1 dns.go:555] Could not find endpoints for service "spark-pi-0665ad323820371cb215063987a31e05-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
I0306 16:06:26.414146       1 dns.go:555] Could not find endpoints for service "spark-pi-2bf24282e8033fa9a59098616323e267-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
I0307 08:16:17.404971       1 dns.go:555] Could not find endpoints for service "spark-pi-3887031e031732108711154b2ec57d28-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
I0307 08:17:11.682218       1 dns.go:555] Could not find endpoints for service "spark-pi-3d84127226393fc99e2fe035db56bfb5-driver-svc" in namespace "default". DNS records will be created once endpoints show up. 

I really can't figure out why those errors come up.

Upvotes: 5

Views: 4134

Answers (4)

Ahmed Anis
Ahmed Anis

Reputation: 1

Test if pod with your spark image can resolve dns

  • create test_dns.yaml file:
apiVersion: v1
kind: Pod
metadata:
  name: testdns
  namespace: default
spec:
  containers:
  - name: testdns
    image: <your-spark-image>
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
  • apply it
kubectl apply -f test-dns.yaml
  • run nslookup for kubernetes.default.svc
kubectl exec -ti testdns -- nslookup kubernetes.default.svc
  • run it multiple time to check if your dns run consistently

Upvotes: 0

Alexis
Alexis

Reputation: 5049

To add to openbrace's answer, and based on Ivan Lee's answer too, if you are using minikube, running the following command was enough for me:

kubectl create clusterrolebinding default --clusterrole=edit --serviceaccount=default:default --namespace=default

That way, I didn't have to change spark.kubernetes.authenticate.driver.serviceAccountName when using spark-submit.

Upvotes: 0

openbrace
openbrace

Reputation: 1

I faced the same issue. If you're using minikube. Try deleting minikube using minikube delete and minikube start. Then create serviceaccount and clusterrolebinding

Upvotes: 0

Ivan Lee
Ivan Lee

Reputation: 4261

Try to alter the pod network with one method except Calico, check whether kube-dns work well.

To create a custom service account, a user can use the kubectl create serviceaccount command. For example, the following command creates a service account named spark:

$ kubectl create serviceaccount spark

To grant a service account a Role or ClusterRole, a RoleBinding or ClusterRoleBinding is needed. To create a RoleBinding or ClusterRoleBinding, a user can use the kubectl create rolebinding (or clusterrolebinding for ClusterRoleBinding) command. For example, the following command creates an edit ClusterRole in the default namespace and grants it to the spark service account created above:

$ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

Depending on the version and setup of Kubernetes deployed, this default service account may or may not have the role that allows driver pods to create pods and services under the default Kubernetes RBAC policies. Sometimes users may need to specify a custom service account that has the right role granted. Spark on Kubernetes supports specifying a custom service account to be used by the driver pod through the configuration property spark.kubernetes.authenticate.driver.serviceAccountName=. For example to make the driver pod use the spark service account, a user simply adds the following option to the spark-submit command:

spark-submit --master k8s://https://192.168.1.5:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf spark.kubernetes.container.image=leeivan/spark:latest local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

Upvotes: 5

Related Questions