SVM rank works only on tiny datasets

I am using svm-rank.

When running svm_rank_learn on a tiny dataset:

Training set properties: 3 features, 12 rankings, 596 examples

The run finishes in a few seconds and I get a valid model. But when I use a bit larger dataset:

Training set properties: 3 features, 30 rankings, 1580 examples

The run is stuck for hours on iteration 29. This is very strange since the documentation states that svm-rank "scales linearly in the number of rankings (i.e. queries)".

What is wrong with my dataset or format?

Upvotes: 0

Answers (2)

Շուշան Առաքելյան

Reputation: 107

Your feature values fall into different ranges. Try scaling your features across samples to have zero mean and unit variance for every feature. It also helps to normalize features within every single sample. These two steps speed up calculations immensely.

Scikit-learn has a nice introduction about data preprocessing and it also provides methods allowing to do this easily, find more on http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing.

Upvotes: 0

Unapiedra

Reputation: 16197

However, since I did not want to spend more than an afternoon on coding SVMrank, I only implemented a simple separation oracle that is quadratic in the number of items in each ranking (not the O[k*log k] separation oracle described in [Joachims, 2006]). http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html

You are more or less increasing the number of examples by 3. So, you'd expect that the time increases by a factor of 9.

[S]ince the documentation states that svm-rank "scales linearly in the number of rankings (i.e. queries)"

You scale the number of rankings also by a factor of a bit more than 2. So, combine both of this, and you'd expect the training to take around 20 times longer.

This doesn't explain why it would go from a few seconds to multiple hours.

Upvotes: 1

SVM rank works only on tiny datasets

Answers (2)

Related Questions