Henrique Branco
Henrique Branco

Reputation: 1940

SparkException: Python worker failed to connect back when execute spark action

When I try to execute this command line at pyspark

arquivo = sc.textFile("dataset_analise_sentimento.csv")

I got the following error message:

Py4JJavaError: An error occurred while calling z:
org.apache.spark.api.python.PythonRDD.runJob.: 
org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.SparkException: Python worker failed to connect back.

I have tried the following steps:

None of the steps above worked for me and I can´t find a solution.

Actually I´m using the following versions:

Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.3.4

Upvotes: 5

Views: 9118

Answers (2)

TheZing
TheZing

Reputation: 11

I got the same error. Got it resolved by Java classpath to JDK 11.

enter image description here

Upvotes: 1

Henrique Branco
Henrique Branco

Reputation: 1940

I just configure the following variables environment and now it's working normally:

  • HADOOP_HOME = C:\Hadoop
  • JAVA_HOME = C:\Java\jdk-11.0.6
  • PYSPARK_DRIVER_PYTHON = jupyter
  • PYSPARK_DRIVER_PYTHON_OPTS = notebook
  • PYSPARK_PYTHON = python

Actually I´m using the following versions:

Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.4.3 and using Jupyter Notebook with pyspark.

Upvotes: 4

Related Questions