Content based recommender system with sklearn or numpy

Question

I am trying to build a content-based recommender system in python/pandas/numpy/sklearn.

Here are the matrix involved and their size:

X: n_customers * n_features (contains the features of each customer)

Y: n_customers *n_products (contains the scores given by each customer to each product)

Theta: n_features * n_products

The aim is to learn Theta in order to be able to predict the score given by a customer to all products (X*Theta). Indeed, Y is a sparse matrix, a customer score only a very small % of the whole quantity of products. This is why Y contains a lot of NaN values.

Here is my problem:

This is a regression problem with many targets (here target=product). But I want to do the regression only on not null values. because the number of NaN differ from one product to another, how can I vectorize that ?

Assume there are 1000 products and 100 000 customers, each one having 20 features.

For each product I need to the regression on the not null values. So without vectorization, I would need 1000 different regressor learning each one a Theta vector of length 20.

If possible I would like to solve this problem with sklearn. The ridge regression for example takes into account multiple targets (Y as a matrix)

I hope it's clear enough.

Thank you for your help.

Content based recommender system with sklearn or numpy

Answers (1)

Related Questions