Reputation: 245
since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:
18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
The spark session is created as following:
val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate
Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.
Upvotes: 18
Views: 18349
Reputation: 31
Setting .config("spark.driver.host", "localhost")
fixed the issue for me.
SparkSession spark = SparkSession
.builder()
.config("spark.master", "local")
.config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
.config("spark.hadoop.fs.s3a.buffer.dir", "/tmp")
.config("spark.driver.memory", "2048m")
.config("spark.executor.memory", "2048m")
.config("spark.driver.bindAddress", "127.0.0.1")
.config("spark.driver.host", "localhost")
.getOrCreate();
Upvotes: 3
Reputation: 141
If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).
In PySpark:
spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()
Upvotes: 11
Reputation: 1865
As mentioned in above answers, You need to change SPARK_LOCAL_HOSTNAME to localhost. In windows, you have to use SET command, SET SPARK_LOCAL_HOSTNAME=localhost
but this SET command is temporary. you may have to run it again and again in every new terminal. but instead, you can use SETX command, which is permanent.
SETX SPARK_LOCAL_HOSTNAME localhost
You can type above command in any place. just open a command prompt and run above command. Notice that unlike SET command, SETX command do not allow equation mark. you need to separate environment variable and the value by a Space.
if Success, you will see a message like "SUCCESS: Specified value was saved"
you can also verify that your variable is successfully added by just typing SET
in a different command prompt. (or type SET s
, which gives variables, starting with the letter 'S'). you can see that SPARK_LOCAL_HOSTNAME=localhost in results, which will not happen if you use SET command instead of SETX
Upvotes: 4
Reputation: 4050
For anyone working in Jupyter Notebook. Adding %env SPARK_LOCAL_HOSTNAME=localhost
to the very beginning of the cell solved it for me. Like so:
%env SPARK_LOCAL_HOSTNAME=localhost
import findspark
findspark.init()
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Test")
sc = SparkContext(conf = conf)
Upvotes: 1
Reputation: 542
I would like to complement @Prakash Annadurai answer by saying:
If you want to make the variable settlement last after exiting the terminal, add it to your shell profile (e.g. ~/.bash_profile
) with the same command:
export SPARK_LOCAL_HOSTNAME=localhost
Upvotes: 0
Reputation: 848
Change your hostname to have NO underscore.
spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 to spark://HeartbeatReceiver@LXCtrusty1802d57a40eb:44610
Ubuntu AS root
#hostnamectl status
#hostnamectl --static set-hostname LXCtrusty1802d57a40eb
#nano /etc/hosts
127.0.0.1 LXCtrusty1802d57a40eb
#reboot
Upvotes: 1
Reputation: 1440
This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.
It seems that this change is required from 2.3 onwards.
Upvotes: 11
Reputation: 82
Try to run Spark locally, with as many worker threads as logical cores on your machine :
.master("local[*]")
Upvotes: 0
Reputation: 327
Change the SPARK_LOCAL_HOSTNAME
to localhost
and try.
export SPARK_LOCAL_HOSTNAME=localhost
Upvotes: 28