Reputation: 319
I have a CSV file that looks like:
customer_ID, location, ....other info..., item-bought, score
I am trying to build a collaborative filtering recommender in Spark. Spark takes data of the form:
userID, itemID, value
but my data is longer, I want all user's info to be used instead of just userID
. I tried grouping the columns in one column as:
(customerID,location,....),itemID,score
but the ALS.train
gives me this error:
TypeError: int() argument must be a string or a number, not 'tuple'
How can I let spark take multiple key/values and not only three columns? thanks
Upvotes: 2
Views: 946
Reputation: 3107
For each customer, identify the columns which you would like to use to distinguish these user-entities. Create a table (e.g. in SQL) in which each row contains the information for one user-entity, and use the row number in this table as the userID.
Do the same for your items if necessary, and provide these IDs to your classifier.
Upvotes: 1