Yuejiang_Li
Yuejiang_Li

Reputation: 33

Azure HDInsight Jupyter and pyspark not working

I created a HDInsight cluster on azure with the following parameters:

Spark 2.4 (HDI 4.0)

And I try the tutorial of HDInsights for Apache Spark with PySpark Jupyter Notebook, and it works just fine. But ever since I re-run the notebook for the second time or start the new one, and run simple

from pyspark.sql import *

or other commands, they all end up with

The code failed because of a fatal error:
    Session 7 did not start up in 180 seconds..

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.

After this, I also tried pyspark with ssh. When I connected to the cluster through ssh and run

$ pyspark

It shows the following information

SPARK_MAJOR_VERSION is set to 2, using Spark2
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

and stuck right there.

I was wondering If I missed any operation? or it is a bug or something. And How could I fix this problem?

Upvotes: 0

Views: 900

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

As per my observation, you will get this error message when you have issue with “YARN” services example: YARN service is stopped.

ERROR: First I had stopped “YARN” services.

enter image description here

Now I started using Jupyter notebook and when I run the same query, experiencing the same error message as yours.

enter image description here

WALKTHROUGH: ERROR MESSAGE

enter image description here

SUCCESS: All Ambari services are running without any issue.

enter image description here

To successfully run “Jupyter Notebook” queries, make sure all the services are running without any issue.

enter image description here

WALKTHROUGH: SUCCESS MESSAGE

enter image description here

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Here are the steps to create a Jupyter notebook and run queries on Azure HDInsight Spark cluster:

Go to Azure Portal => From Cluster Dashboards => Select Jupyter Notebook => Create Pyspark notebook => And execute the queries as shown.

enter image description here

You can use interactive Apache for running Pyspark (Python) queries:

enter image description here

Reference: https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-shell

Upvotes: 1

Related Questions