Reputation: 1
Based on the tutorial from pyspark. I am trying to create a recommendation system using pyspark with RMSE as the evaluation metric. I would like to record the RMSE for each training epoch. However, the epoch number is enter when I create the ALS object and it seems that I can only print RMSE value after the training is done. May I please ask how I can print each epoch's RMSE using ALS from pyspark?
https://spark.apache.org/docs/latest/ml-collaborative-filtering.html
Upvotes: 0
Views: 227
Reputation: 6226
Without a reproducible example or the structure of the data it is hard to be specific. So I can propose you something about RMSE from there but more details might be needed:
import pyspark.sql.functions as psf
def compute_RMSE(expected_col, actual_col, group_col):
rmse = old_df.withColumn("squarederror",
psf.pow(psf.col(actual_col) - psf.col(expected_col),
psf.lit(2)
))
.groupby(group_col)
.agg(psf.avg(psf.col("squarederror")).alias("mse"))
.withColumn("rmse", psf.sqrt(psf.col("mse")))
return(rmse)
And you could call the function like that (here some details might help to be more specific)
compute_RMSE("expectedcol", "realizedcol","epoch")
Upvotes: 0