user3739594
user3739594

Reputation: 11

Spark Application Driver failing to run a spark submit request

I am running k3s on a cluster or raspberry pi's I was able to run spark operator on the cluster and do a simple test. But as I began to think of a more realistic use case I ran into issues with adding volume mounts. I was able to get slightly far by running the following command.

helm install my-release ./spark-operator --namespace spark-operator  --set "sparkJobNamespaces={spark-operator}" --set webhook.enable=true --set webhook.port=443

It seems like this has helped with volume mounts but now I am running into an issue with the secureness of the webhook.enable enabled.

When the driver starts up and begins process the request It fail when getting the spark context at this section.

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://raspberrypi-1:6443/api/v1/namespaces/spark-operator/pods/spark-example-driver. Message: Unauthorized. Received status: Status(apiVersion=v1, code=401, details=null, kind=Status, message=Unauthorized, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Unauthorized, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:671)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:651)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.assertResponseCode(OperationSupport.java:600)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:560)
    at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
    at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:140)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
    at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:52)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
    at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:137)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
24/06/25 21:42:55 INFO ShutdownHookManager: Shutdown hook called

As far as I can tell the issues may stem from this line

24/06/25 21:42:50 WARN Config: Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
24/06/25 21:42:54 WARN Config: Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.

When I create a simple app that sleeps in intervals I am able to exec into the pod and see the tokens exist in the pod but are root access only. So MY theory is that the spark user in the pod does not have access to the token and is getting this 401 error. I am bit stuck and wasted 4 days trying to debug this.

With the help of chat gpt I was able to verify that the service account/roles/rolebinding/clusterrolebindings are in order. I'fe tried changing the run as user but noting works.

Upvotes: 0

Views: 436

Answers (0)

Related Questions