bereshine
bereshine

Reputation: 1

How to record each epoch RMSE using ALS in pyspark

Based on the tutorial from pyspark. I am trying to create a recommendation system using pyspark with RMSE as the evaluation metric. I would like to record the RMSE for each training epoch. However, the epoch number is enter when I create the ALS object and it seems that I can only print RMSE value after the training is done. May I please ask how I can print each epoch's RMSE using ALS from pyspark?

https://spark.apache.org/docs/latest/ml-collaborative-filtering.html

Upvotes: 0

Views: 227

Answers (1)

linog
linog

Reputation: 6226

Without a reproducible example or the structure of the data it is hard to be specific. So I can propose you something about RMSE from there but more details might be needed:

import pyspark.sql.functions as psf

def compute_RMSE(expected_col, actual_col, group_col):

  rmse = old_df.withColumn("squarederror",
                           psf.pow(psf.col(actual_col) - psf.col(expected_col),
                                   psf.lit(2)
                           ))
  .groupby(group_col)
  .agg(psf.avg(psf.col("squarederror")).alias("mse"))
  .withColumn("rmse", psf.sqrt(psf.col("mse")))

  return(rmse)

And you could call the function like that (here some details might help to be more specific)

compute_RMSE("expectedcol", "realizedcol","epoch")

Upvotes: 0

Related Questions