Reputation: 1368
I am using svm-rank.
When running svm_rank_learn
on a tiny dataset:
Training set properties: 3 features, 12 rankings, 596 examples
The run finishes in a few seconds and I get a valid model. But when I use a bit larger dataset:
Training set properties: 3 features, 30 rankings, 1580 examples
The run is stuck for hours on iteration 29. This is very strange since the documentation states that svm-rank "scales linearly in the number of rankings (i.e. queries)".
What is wrong with my dataset or format?
Upvotes: 0
Views: 621
Reputation: 107
Your feature values fall into different ranges. Try scaling your features across samples to have zero mean and unit variance for every feature. It also helps to normalize features within every single sample. These two steps speed up calculations immensely.
Scikit-learn has a nice introduction about data preprocessing and it also provides methods allowing to do this easily, find more on http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing.
Upvotes: 0
Reputation: 16197
However, since I did not want to spend more than an afternoon on coding SVMrank, I only implemented a simple separation oracle that is quadratic in the number of items in each ranking (not the O[k*log k] separation oracle described in [Joachims, 2006]). http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html
You are more or less increasing the number of examples by 3. So, you'd expect that the time increases by a factor of 9.
[S]ince the documentation states that svm-rank "scales linearly in the number of rankings (i.e. queries)"
You scale the number of rankings also by a factor of a bit more than 2. So, combine both of this, and you'd expect the training to take around 20 times longer.
This doesn't explain why it would go from a few seconds to multiple hours.
Upvotes: 1