Reputation: 3324
I am basing this question on this one. The OP says 'This problem doesn't exist in ML as it uses DataFrame and I can simply add another column with the score to my original dataframe.' Can anyone tell me how to do this? I have tried:
val labeledData = data1.select("labels","hash-tfidf").rdd.map { row =>
LabeledPoint(row.getAs[Double]("labels"), row.getAs[org.apache.spark.ml.linalg.SparseVector]("hash-tfidf"))
}
val scoreDF = model.transform(labeledData.toDS)
val dfPredictions = data1.withColumn("prediction", scoreDF.col("prediction"))
where data1 is my original dataframe with lots of columns. This errors with:
org.apache.spark.sql.AnalysisException: resolved attribute(s) prediction#1458 missing from ....[loads of fields I think from data1]...
What am I doing wrong?
Upvotes: 0
Views: 23
Reputation: 35229
You don't need RDDs
and you don't need LabeledPoint
and you cannot add column from another DataFrame
.
It is not clear what the model
is, but I assume it's input column is features
so you can either rename the column:
model.transform(data1.withColumnRenamed("hash-tfidf", "features"))
or configure model
to accept hash-tfidf
as input.
Upvotes: 1