haihong zeng
haihong zeng

Reputation: 71

AttributeError: 'NoneType' object has no attribute 'sc'

Excuse me.Today i want to run a program about how to create DataFrame with sqlContext in Pyspark.The result is a AttributeError,which is"AttributeError: 'NoneType' object has no attribute 'sc'" My computer is win7,spark's version is 1.6.0 ,and API is python3 .I had google several times and read the Spark Python API Docs,and can not solved the problems.So i look for your help.

my code is that:

   #python version is 3.5
   sc.stop()
   import pandas as pd
   import numpy as np
   sc=SparkContext("local","app1"
   data2=[("a",5),("b",5),("a",5)]
   df=sqlContext.createDataFrame(data2)

And the result is that:


    AttributeError                            Traceback (most recent call last)
    <ipython-input-19-030b8faadb2c> in <module>()
    5 data2=[("a",5),("b",5),("a",5)]
    6 print(data2)
    ----> 7 df=sqlContext.createDataFrame(data2)

    D:\spark\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\context.py in  createDataFrame(self, data, schema, samplingRatio)
    426             rdd, schema = self._createFromRDD(data, schema, samplingRatio)
    427         else:
    --> 428             rdd, schema = self._createFromLocal(data, schema)
    429         jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
    430         jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())

    D:\spark\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\context.py in _createFromLocal(self, data, schema)
   358         # convert python objects to sql data
   359         data = [schema.toInternal(row) for row in data]
   --> 360         return self._sc.parallelize(data), schema
   361 
   362     @since(1.3)

    D:\spark\spark-1.6.0-bin-hadoop2.6\python\pyspark\context.py in parallelize(self, c, numSlices)
   410         [[], [0], [], [2], [4]]
   411         """
   --> 412         numSlices = int(numSlices) if numSlices is not None else self.defaultParallelism
   413         if isinstance(c, xrange):
   414             size = len(c)

   D:\spark\spark-1.6.0-bin-hadoop2.6\python\pyspark\context.py in     defaultParallelism(self)
  346         reduce tasks)
  347         """
  --> 348         return self._jsc.sc().defaultParallelism()
  349 
  350     @property

 AttributeError: 'NoneType' object has no attribute 'sc'

I am so fuzzed that i had created the "sc" in fact,why does it show the Error of"'NoneType' object has no attribute 'sc'"?

Upvotes: 7

Views: 23877

Answers (3)

Tariq AlAbdulrahman
Tariq AlAbdulrahman

Reputation: 1

I believe we are getting the error "AttributeError: 'NoneType' object has no attribute 'sc'" because we were running two SparkContexts at the same time.

"A SparkContext represents the connection to a Spark cluster, and can be used to create RDD and broadcast variables on that cluster."

also

"Only one SparkContext should be active per JVM. You must stop() the active SparkContext before creating a new one. SparkContext instance is not supported to share across multiple processes out of the box, and PySpark does not guarantee multi-processing execution. Use threads instead for concurrent processing purpose."

In my experience the error was resolved when I restarted my juypter kernal and only ran one SparkContext.

Source of quotes: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.html

Upvotes: 0

Soerendip
Soerendip

Reputation: 9138

Here is a minimal example that worked for me. I am not sure, why you imported pandas in the first place, if you are not using it afterwards. I guess your intention was to create a DataFrame from a pandas object. Therefore here is an example to generate a spark-DataFrame from a pandas-Dataframe.

import pandas as pd
from pyspark import SQLContext
df = pd.DataFrame({'x': [1, 2, 3]})
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
sqlContext.createDataFrame(df)

I am running spark in a jupyter notebook too.

Upvotes: 1

Assaf Mendelson
Assaf Mendelson

Reputation: 12991

This should work (except in the code you have a missing ')' in the end of sc creation which I imagine is a type). You can try creating sc as follows:

conf = SparkConf().setAppName("app1").setMaster("local")
sc = SparkContext(conf=conf)

BTW sc.stop means you already have a spark context which is true if you used pyspark but not if you use spark-submit. It is better to use SparkContext.getOrCreate which works in both cases.

Upvotes: 1

Related Questions