Reputation: 1
I don't know how to write a code to load a CSV file or .inter file instead of the built in dataset in this example of evaluating a dataset as a recommender system:
from surprise import SVD
from surprise import KNNBasic
from surprise import Dataset
from surprise.model_selection import cross_validate
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')
# Use the famous SVD algorithm.
algo = KNNBasic()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
How would the full line of code be where I only need to input datapath and filename? I have tried the website for Surprise, but I didn't find anything. So I don't want the movielens code in the example, but instead a line that loads a datapath and file.
Upvotes: 0
Views: 2076
Reputation: 41
At first you need to create instance of Reader()
:
reader = Reader(line_format=u'rating user item', sep=',', rating_scale=(1, 6), skip_lines=1)
Note that line_format
parameter can be only 'rating user item'
(optionally 'timestamp'
may be added) and these parameters has nothing to do with names of columns in your custom_rating.csv
. Thats why skip_lines=1
prameter is defined (it skips first line in your csv file where usually column names are defined).
On the other hand line_format
parameter determines the order of columns. So just to be clear my custom_ratings.csv
looks like this:
rating,userId,movieId
4,1,1
6,1,2
1,1,3
. . .
. . .
. . .
Now you can create your data
instance:
data = Dataset.load_from_file("custom_rating.csv", reader=reader)
Finally you can proceed with creating SVD model as shown in examples:
# sample random trainset and testset
# test set is made of 20% of the ratings.
trainset, testset = train_test_split(data, test_size=.2)
# We'll use the famous SVD algorithm.
algo = SVD()
# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)
# Then compute RMSE
accuracy.rmse(predictions)
PS: And also don't forget to import libraries at the beginning of your code :)
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import train_test_split
Upvotes: 0