Reputation: 113
I have a standalone spark 2.4.0 cluster to which I need to deploy app passing some extra java options (to both driver and executors).
To do that I use spark.driver.extraJavaOptions
and spark.executor.extraJavaOptions
described here.
It works perfectly fine in client mode however there are problems in cluster mode - variables are not passed to driver (for executors it's still fine).
I was facing similar issues for spark.driver.extraClassPath
as well so I guess problem is more generic.
Anyway, I've managed to find a solution for that:
spark.master.rest.enabled
(since 2.4.0 it's false by default, true in older releases - see PR)Questions:
I was not able to find in documentation that we actually need to deploy via REST when using cluster mode to make spark.driver.extraJavaOptions
(and similar) option work as expected. Official doc doesn't mention it. Is it documented anywhere else or am I missing something obvious?
I guess submitting in cluster mode is quite a common use case. If doing this properly requires using REST submission server (please, correct me if I'm wrong) why was it disabled by default?
When I try to submit in regular way (7077 port) with spark.master.rest.enabled
set to true I get following info in logs:
Warning: Master endpointspark://localhost:7077 was not a REST server. Falling back to legacy submission gateway instead.
Judging by that I would say that in general not submitting via REST is legacy but again - it's not documented anywhere and also why would they disable REST submission by default (see my 2nd question)?
StandaloneAppClient$ClientEndpoint:87 - Failed to connect to master localhost:6066
Does that mean that we always must switch a port when we change deploy mode? What's the point, why can't we have one way to deploy our app?Upvotes: 1
Views: 1983
Reputation: 113
Looks like it is a bug reported in SPARK Jira
PR with the fix was raised, hopefully will be merged soon
Upvotes: 0
Reputation: 842
I am far from being an expat, I also haven't worked with 2.4 yet, but I will share what I know.
I don't remember having problems with class path but it doesn't say much. I mostly use the rest API and with cluster mode. Just to be sure.. the jars are started with "local:/" right?
AFAIK the rest is aka "Spark hidden API" which can explain the "not able to find in documentation".
In my opinion, the rest API is not secured in any way which might be the reason it was hidden? But I'm glad to hear that at least it is now disabled by default, I think it was default enabled on earlier versions.
The "Falling back to legacy submission gateway instead" rings a bell so I think it is ok (didn't have problems with extra class path)
I don't think that rest API supports client mode. How can it? Jetty runs on the master handing the submit request. I don't see how it can now start a driver process on the calling host?
As for the missing jars on classpath, have you tried "spark.jars"?
If all fails... Try uber jar :-)
Upvotes: 0