Reputation: 919
I am very new to Spark Machine Learning (2 days old) i am executing the below code in Spark Shell i am trying to predict some value i see this error post available in Stackoverflow but i am not able to fix my code with the proper solution so posting the question again apologize for the same
Input data:
1.00,1.00,9.00
1.00,2.00,10.00
1.00,3.00,9.00
1.00,4.00,9.00
1.00,5.00,9.00
1.00,6.00,9.45
1.00,7.00,9.45
1.00,8.00,9.45
1.00,9.00,9.45
Code:
val df = spark.read.csv("/root/Predictiondata.csv").toDF("Userid", "Date", "Intime")
import org.apache.spark.sql.types.DoubleType
val featureDf = df.select( df("Userid").cast(DoubleType).as("Userid"),df("Date").cast(DoubleType).as("Date"),df("Intime").cast(DoubleType).as("Intime")).toDF()
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
val data = featureDf.select("Userid","Date","Intime").map(r => LabeledPoint(r(0).toString.toDouble,Vectors.dense(r(1).toString.toDouble,r(2).toString.toDouble))).toDF()
import org.apache.spark.ml.regression.LinearRegression
val lr = new LinearRegression()
val lrModel = lr.fit(data)
Error:
scala> val lrModel = lr.fit(data)
java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51)
at org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:72)
at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:122)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
... 48 elided
Any help or Suggestion is highly appreciated.
Thanks in Advance
Upvotes: 0
Views: 7878
Reputation: 18601
Please use Spark 2+ with DataFrame API together with VectorAssembler
Something like this (haven't tested it):
import spark.implicits._
val data = spark.read
.option("inferSchema", true)
.csv("/root/Predictiondata.csv")
.toDF("Userid", "Date", "Intime")
val dataWithFeatures = new VectorAssembler()
.setInputCols(Array("Date", "Intime"))
.transform(data)
val dataWithLabelFeatures = dataWithFeatures
.withColumn("label", $"Userid")
val lrModel = new LinearRegression().fit(dataWithLabelFeatures)
Also, take a look at Pipeline
Upvotes: 1
Reputation: 23109
If your Spark is > 2.x import
org.apache.spark.ml.linalg.VectorUDT
and not
org.apache.spark.mllib.linalg.VectorUDT
Upvotes: 3