Reputation: 4373
I am using spark ver 2.0.1
def f(l):
print(l.b_appid)
sqlC=SQLContext(spark)
mrdd = sqlC.read.parquet("hdfs://localhost:54310/yogi/device/processed//data.parquet")
mrdd.forearch(f) <== this gives error
Upvotes: 1
Views: 16272
Reputation: 10450
In Spark 2.X
- in order to use Spark Session
(aka spark
) you need to create it
You can create SparkSession
like this:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.getOrCreate()
Once you have the SparkSession
object (spark
) you can use it like this:
mydf = spark.read.parquet("hdfs://localhost:54310/yogi/device/processed//data.parquet")
mydf.forearch(f)
More info can be found in Spark Sessions section in spark docs:
class pyspark.sql.SparkSession(sparkContext, jsparkSession=None)
The entry point to programming Spark with the Dataset and DataFrame API. A SparkSession can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. To create a SparkSession, use the following builder pattern:
spark = SparkSession.builder \
.master("local") \
.appName("Word Count") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
Info about class builder can be found in class Builder - Builder for SparkSession.
Upvotes: 4