multiple features in collaborative filtering- spark

Question

I have a CSV file that looks like:

customer_ID, location, ....other info..., item-bought, score

I am trying to build a collaborative filtering recommender in Spark. Spark takes data of the form:

userID, itemID, value

but my data is longer, I want all user's info to be used instead of just userID. I tried grouping the columns in one column as:

(customerID,location,....),itemID,score

but the ALS.train gives me this error:

TypeError: int() argument must be a string or a number, not 'tuple'

How can I let spark take multiple key/values and not only three columns? thanks

Rohit Chatterjee · Accepted Answer

For each customer, identify the columns which you would like to use to distinguish these user-entities. Create a table (e.g. in SQL) in which each row contains the information for one user-entity, and use the row number in this table as the userID.

Do the same for your items if necessary, and provide these IDs to your classifier.

multiple features in collaborative filtering- spark

Answers (1)

Related Questions