rehan ali
rehan ali

Reputation: 91

Dimensionality reduction for high dimensional sparse data before clustering or spherical k-means?

I am trying to build my first recommender system where i create a user feature space and then cluster them into different groups. Then for the recommendation to work for a particular user , first i find out the cluster to which the user belongs and then recommend entities(items) in which his/her nearest neighbor showed interest. The data which i am working on is high dimensional and sparse. Before implementing the above approach, there are few questions, whose answers might help me in adopting a better approach.

  1. As my data is high dimensional and sparse, should i go for dimensionality reduction and then apply clustering or should I go for an algorithm like spherical K-means which works on sparse high dimensional data?

  2. How should I find the nearest neighbors after creating clusters of users.(Which distance measure should i take as i have read that Euclidean distance is not a good measure for high dimensional data)?

Upvotes: 2

Views: 662

Answers (1)

Dan Jarratt
Dan Jarratt

Reputation: 380

It's not obvious that clustering is the right algorithm here. Clustering is great for data exploration and analysis, but not always for prediction. If your end product is based around the concept of "groups of like users" and the items they share, then go ahead with clustering and simply present a ranked list of items that each user's cluster has consumed (or a weighted average rating, if you have preference information).

You might try standard recommender algorithms that work in sparse high-dimensional situations, such as item-item collaborative filtering or sparse SVD.

Upvotes: 1

Related Questions