Reputation: 787
LIBSVM, Java, grid search, performance, slows down
I have been working with the Java version of LIBSVM. In pseudocode, I do a naive grid search for the optimal C and gamma pairs, take these training model files, and then perform cross-validation against 10 k-fold data sets seeking the best parameters.
I noted what seemed to be anecdotal slow downs as svm_predict is repeatedly called during the grid search. At first I thought this was simply a fluke but I have been carefully testing an the testing indicates that the processing time for svm_predict increases exponentially according to the number of times called.
The first time called, svm_predict takes ~15 milliseconds to perform the predictions on my machine. By the 500th sequential call, svm_predict takes ~541 milliseconds. By the 1000th sequential call, svm_predict is showing about ~8931 milliseconds. By the 1220th call, svm_predict is at about ~21260 milliseconds per call.
(NOTE: the increases in time do not appear related to the C-gamma pairs themselves. There is a consistent increase in time to process even if the pairs are randomized (that is, the model itself is not increasing in complexity).
I have run the software in a profiler and see no obvious memory leaks or any memory issues at all--both heap and stack traces remain fairly stable or show oscillations well within the allocated memory limits. Even testing to suggest garbage collection does not affect performance at all.
My software "wraps" the LIBSVM internally. The grid search merely runs through a range of C-gamma pairs on at a time calling svm_predict on each training file to measure performance.
Has anyone else seen this issue? Is there a fix? Gris search is very intensive anyway but with times quickly going to 21 seconds per prediction, doing even a fairly basic search (400 C-gamma pairs) becomes very time consuming even on high end equipment. Any advice?
NEW INFO (10 Oct 2014: I continue to test and tentatively confirm that the issue seems to be slow downs with LIBSVM with repeated calls to svm_predict during a grid search
I also have a test harness to manually test svm_predictions based on previously generated MODEL and DATA files. That is, I can manually test each model-data file prediction. The elapsed time to predict after 648 iterations using grid search is 1183 milliseconds per file. Precisely the same model-data file pair manually running a single instance of svm_predict results in 34 milliseconds. This confirms my concerns about svm_predict. Has anyone else seen this or does anyone have a workable, suggested remedy?
Upvotes: 0
Views: 412
Reputation: 787
The problem encountered was NOT in the native LIBSVM Java libraries.
The problem arose from an error in my grid search code. Since others may face this when implementing their own code, I provide a quick answer.
I added a simple class to the native SVM library that aggregates the output of the svm_predict STATIC class. I had also added, previously, a simple method to svm_predict to "reset" the static svm-predict class. Unfortunately, I omitted calling the reset on svm_predict in the grid search method. Thus, other processing that used the simple class added to svm_predict exponentially increased in processing time and caused the apparent slow down.
While this was a silly error in retrospect (and took quite some time to identify), the testing confirmed, actually, that svm_predict seems fairly efficient. I repeatedly tested (with tens of thousands of tests), and svm_predict consistently yielded good results for my purposes--under 15 ms per prediction batch.
For context, the erroneous code was taking over 2 hours to complete all of the grid search tests. After the bug fix, run time fell to 56 seconds for exactly the same grid search tests. (No, that is not a misprint but a testament to how even simple algorithms can have a profound effect on processing time.)
Upvotes: 1