Reputation: 457
The scala code that i wrote gives me data type errors. The main method which is testpredict_02 takes Double.
val featuresMD = hiveContext.read.parquet("hdfs://machine01:9000/models/nb/metadata/features")
def testpredict_02(VData: Vector) = { MyModel.predict(VData) }
def outerpredict_02(argincome: String,argage: String,arggender: String) = {
featuresMD.registerTempTable("features_md")
val income = hiveContext.sql("select distinct income_index from features_md where income = argincome")
val age = hiveContext.sql("select distinct age_index from features_md where age = argage")
val gender = hiveContext.sql("select distinct gender_index from features_md where gender = arggender")
testpredict_02(Vectors.dense(income.select("income_index"), age.select("age_index"), gender.select("gender_index")))
Error :
<console>:43: error: type mismatch;
found : org.apache.spark.sql.DataFrame
required: Double
testpredict_02(Vectors.dense(income.select("income_index"), age.select("age_index")))
Please help..
Upvotes: 0
Views: 1470
Reputation: 37852
If you're sure each of the 3 Dataframes contains exactly one column and one record, you can get the first column of the first record for each of them:
def getFirstCell(df: DataFrame): Double = df.first().getAs[Double](0)
val vector: Vector = Vectors.dense(
getFirstCell(income.select("income_index")),
getFirstCell(age.select("age_index")),
getFirstCell(gender.select("gender_index"))
)
testpredict_02(vector)
Upvotes: 1