ScientistBoy
ScientistBoy

Reputation: 41

implicit recomendation with ML spark and data frames

I am trying to use the new ML libraries with Spark and Dataframes for building a recommender with implicit ratings. My code

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import Row 

from pyspark.ml.recommendation import ALS

sc = SparkContext()
sqlContext = SQLContext(sc)

# create the dataframe (user x item)
df = sqlContext.createDataFrame(
    [(0, 0), (0, 1), (1, 1), (1, 2), (2, 1), (2, 2)],
    ["user", "item"])
als = ALS() \
    .setRank(10) \
    .setImplicitPrefs(True)
model = als.fit(df)
print "Rank %i " % model.rank

model.userFactors.orderBy("id").collect()
test = sqlContext.createDataFrame([(0, 2), (1, 0), (2, 0)], ["user", "item"])
predictions = sorted(model.transform(test).collect(), key=lambda r: r[0])
for p in predictions: print p

However, I run in this error

pyspark.sql.utils.AnalysisException: cannot resolve 'rating' given input columns user, item;

So, Not sure how to define the data frame

Upvotes: 2

Views: 850

Answers (2)

antonio
antonio

Reputation: 21

I am confused because the MLLIB API has a separate API call for implicit

http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html

val alpha = 0.01
val lambda = 0.01
val model = ALS.trainImplicit(ratings, rank, numIterations, lambda, alpha)

Upvotes: 2

xenocyon
xenocyon

Reputation: 2498

It appears you are trying to use (user, product) tuples, but you need (user, product, rating) triplets. Even for implicit ratings, you do need the ratings. You can use a constant like 1.0 if they are all the same.

Upvotes: 1

Related Questions