Reputation: 51
I am using PySpark within Azure Databricks. I used Sparks MLlib Library ALS Algorithm to predict movie ratings which works successfully. However, I am trying to add a dataframe that consist of my ratings for 10 randomly selected movies. When I do this I only get prediction rankings for movies I have already ranked.
I want to be able to use the model to get recommendations based on their rankings.
I have Spark Code that performs the following tasks:
Imports Data (RatingsSmall, MoviesSmall, RatingsLarge, Movies Large)
Merge Ratings small with Movies Small, Merge Ratings Large with Movies Large
Append to two new Datasets together
Drop irrelevant columns Timestamp and Genre
I now have a clean table which has MovieID, Title (Movie Name),UserID and Ranking. I will show the code from this point. If you would like the code before this then I can submit this too.
Split the Data into Training and Test Set (0.80, 0.20)
ALS Algorithm
Display predictions.
Hopefully the above helps you guide through the code I have attached. I only get predictions for rankings I have already submitted.
I have tried to join my rankings to the training set. From here I would like to get recommendations or predictions for the other movies in the dataset.
My attempt: imported a DF with my own rankings. Appended this (UnionAll) to the training set. Got predictions (but only for movies I have already ranked)
code:
#Split dataset
training, test = All_Movies.randomSplit([0.8, 0.2])
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator
#Set up model
ALS = ALS(maxIter=10, regParam=0.01, userCol = "userId",itemCol="movieId", ratingCol="rating", coldStartStrategy="drop")
#Fit model to Training set and attach personal recomendations
model = ALS.fit(training.unionAll(PersonalDF)) #PersonalDF is my rankings
#Get Predictions for Test Set
predictions = model.transform(test).dropna()
#All good up until here.
#Trying to get prediction rankings for my movies
mySampledMovies = model.transform(PersonalDF)
mySampledMovies.registerTempTable("mySampledMovies")
display(sqlContext.sql("select userId, movieId, rating,title, prediction from mySampledMovies"))
I expect a DataFrame that says my userID, MovieID,Ranking, Prediction. For movies I haven't seen ranking to be N/A or Null and Predictions to have a value.
Many Thanks
Upvotes: 1
Views: 290
Reputation: 631
You need to filter out your user_id and get recommendations for yourself.
a = user.where(user.user_id == 'your user id')
model.recommendForUserSubset(a,5).show(1,False)
for details check https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.recommendation.ALSModel.html#pyspark.ml.recommendation.ALSModel.recommendForUserSubset
Upvotes: 0