Reputation: 43
I am testing this Scala code that I found in the MLlib: Main Guide Machine Learning Library (MLlib) Guide
import org.apache.spark.ml.linalg.{Matrix, Vectors, Vector}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row
import scala.collection.Seq
object BasicStatistics {
def main(args: Array[String]): Unit = {
val data: Seq[Vector] = Seq(
Vectors.sparse(4, Seq((0, 1.0), (3, -2.0))),
Vectors.dense(4.0, 5.0, 0.0, 3.0),
Vectors.dense(6.0, 7.0, 0.0, 8.0),
Vectors.sparse(4, Seq((0, 9.0), (3, 1.0))))
val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println(s"Pearson correlation matrix:\n $coeff1")
val Row(coeff2: Matrix) = Correlation.corr(df, "features", "spearman").head
println(s"Spearman correlation matrix:\n $coeff2")
}
}
But this line is reporting an error.
val df = data.map(Tuple1.apply).toDF("features")
It says, "value toDF is not a member of Seq[(org.apache.spark.ml.linalg.Vector,)]"
Seems like the value data (Seq[Vector]) does not have a map method?
Any ideas on how to proceed?
Below is from my pom.xml
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
</dependencies>
Upvotes: 0
Views: 220
Reputation: 2424
this is because of missing implicit conversion for scala.Seq
.
To fix your problem add theses line
val name = "application name"
val spark = SparkSession
.builder
.appName(name)
.master("local")
.getOrCreate()
import spark.implicits._
Hope it helps !
Upvotes: 1
Reputation: 1824
At this point, you don't have a SparkSession
or anything started. I believe toDF
comes from importing spark.implicits._
where spark
is a SparkSession
. The documentation sometimes does not make this clear and/or assumes you're working in the Spark shell, which creates the session automatically.
Your code does run in the spark shell.
Upvotes: 0