welu
welu

Reputation: 339

how to create a sparsematrix from DataFrame in a specific format

I am working with python 3.5 with a DataFrame with columns = ['users_id', 'item_id', 'rating', 'timestamp', 'title'] and i am using model = LightFM(loss='warp') for recommender model

so for the trainning i need a sparseMatrix in a specific format => (users_id, item_id) rating

like this

but i never succeeded when i use thisscipy.sparse.csr_matrix(data['users_id']). It gives me something like this :

(0,0) 5

(0,1) 5

(0,2) 4

(0,3) 5

How should i procced ?

Upvotes: 4

Views: 1195

Answers (1)

VinceDld
VinceDld

Reputation: 86

If you want to create a sparse matrix to after use it in your LightFM model, I think you should use the Dataset object which is provided by the library. For example, if I call your DataFrame df :

from lightfm.data import Dataset

data = Dataset()
data.fit(df.users_id.unique(), df.item_id.unique())
interactions_matrix, weights_matrix = data.build_interactions([tuple(i) for i in df.drop(['timestamp', 'title'], axis = 1).values])

The fit method is use to map your users_id and items_id to an inner id and the build_interactions method create two sparse matrix, one binary with only the interactions between users and items and an other one with the weights (i.e. ratings), it takes an iterable of (user_id, item_id) or (user_id, item_id, weight) as parameter.

Then you can use these two matrices created with build_interactions to fit your model in LightFM.

from lightfm import LightFM

model = LightFM(loss='warp')
model.fit(interactions_matrix, sample_weight = weights_matrix)

You can find more information in the LightFM documentation, you can see for example the section about Building Datasets or the Quickstart.

Upvotes: 7

Related Questions