Reputation: 33
I created a HDInsight cluster on azure with the following parameters:
Spark 2.4 (HDI 4.0)
And I try the tutorial of HDInsights for Apache Spark with PySpark Jupyter Notebook, and it works just fine. But ever since I re-run the notebook for the second time or start the new one, and run simple
from pyspark.sql import *
or other commands, they all end up with
The code failed because of a fatal error:
Session 7 did not start up in 180 seconds..
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.
After this, I also tried pyspark with ssh. When I connected to the cluster through ssh and run
$ pyspark
It shows the following information
SPARK_MAJOR_VERSION is set to 2, using Spark2
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
and stuck right there.
I was wondering If I missed any operation? or it is a bug or something. And How could I fix this problem?
Upvotes: 0
Views: 900
Reputation: 12768
As per my observation, you will get this error message when you have issue with “YARN” services example: YARN service is stopped.
ERROR: First I had stopped “YARN” services.
Now I started using Jupyter notebook and when I run the same query, experiencing the same error message as yours.
WALKTHROUGH: ERROR MESSAGE
SUCCESS: All Ambari services are running without any issue.
To successfully run “Jupyter Notebook” queries, make sure all the services are running without any issue.
WALKTHROUGH: SUCCESS MESSAGE
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Here are the steps to create a Jupyter notebook and run queries on Azure HDInsight Spark cluster:
Go to Azure Portal => From Cluster Dashboards => Select Jupyter Notebook => Create Pyspark notebook => And execute the queries as shown.
You can use interactive Apache for running Pyspark (Python) queries:
Reference: https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-shell
Upvotes: 1