dead programmer
dead programmer

Reputation: 4373

pyspark error: AttributeError: 'SparkSession' object has no attribute 'serializer'

I am using spark ver 2.0.1

def f(l):
    print(l.b_appid)

sqlC=SQLContext(spark)
mrdd = sqlC.read.parquet("hdfs://localhost:54310/yogi/device/processed//data.parquet")
mrdd.forearch(f) <== this gives error

Upvotes: 1

Views: 16272

Answers (1)

Yaron
Yaron

Reputation: 10450

In Spark 2.X - in order to use Spark Session (aka spark) you need to create it

You can create SparkSessionlike this:

from pyspark.sql import SparkSession

spark = SparkSession \
     .builder \
     .appName("Python Spark SQL basic example") \
     .getOrCreate()   

Once you have the SparkSession object (spark) you can use it like this:

mydf = spark.read.parquet("hdfs://localhost:54310/yogi/device/processed//data.parquet")
mydf.forearch(f) 

More info can be found in Spark Sessions section in spark docs:

class pyspark.sql.SparkSession(sparkContext, jsparkSession=None)

The entry point to programming Spark with the Dataset and DataFrame API. A SparkSession can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. To create a SparkSession, use the following builder pattern:

spark = SparkSession.builder \
    .master("local") \
    .appName("Word Count") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

Info about class builder can be found in class Builder - Builder for SparkSession.

Upvotes: 4

Related Questions