Ivan Novikov
Ivan Novikov

Reputation: 578

Is k nearest neighbours regression inherently slow?

I am trying to use k nearest neighbours implementation from scikit learn on a fairly large dataset. The problem is that predictions take a very long time, almost as long as training which doesn't make sense. Is it an issue with the algorithm, or the fact that scikit learn isn't made for large datasets (no GPU support).

For further information, I am trying to predict lidar intensity based on x, y, z and object label. Each lidar scan has ~100,000 points, so I'm trying to predict the intensity for each point.

Upvotes: 3

Views: 6193

Answers (1)

rth
rth

Reputation: 11201

Things to try to make scikit-learn's KNeighborsClassifier run faster:

  • different algorithm parameter: kd_tree, ball_tree for low dimensional data, brute for high dimensional data
  • n_jobs parameter. Using a larger n_jobs doesn't necessarily make things faster, sometimes the opposite.
  • make sure you are using the latest version: there have been performance improvements in v0.22 and some not yet merged optimizations (scikit-learn#14543)
  • use an external approximate nearest neighbours library (e.g. Annoy) together with pre-computed sparse distances using metric="precomputed"

Upvotes: 2

Related Questions