Reputation: 25
Code :
from pyspark.mllib.classification import LabeledPoint, NaiveBayes
from pyspark import SparkContext as sc
data = [
LabeledPoint(0.0, [0.0, 0.0]),
LabeledPoint(0.0, [0.0, 1.0]),
LabeledPoint(1.0, [1.0, 0.0])]
model = NaiveBayes.train(sc.parallelize(data))
model.predict(array([0.0, 1.0]))
model.predict(array([1.0, 0.0]))
model.predict(sc.parallelize([[1.0, 0.0]])).collect()
Upvotes: 0
Views: 496
Reputation: 13831
The problem here is the import on line two of your example:
from pyspark import SparkContext as sc
This is overwriting the built-in SparkContext
instance (stored in sc
) with the SparkContext
class, causing the later sc.parallelize()
call to fail.
In Databricks, you don't need to create the SparkContext yourself; it's automatically pre-defined as sc
in Databricks notebooks. See https://docs.databricks.com/user-guide/getting-started.html#predefined-variables for a more complete list of pre-defined variables in Databricks.
Upvotes: 1