Miranda
Miranda

Reputation: 605

Setting the parameters of locally linear embedding (LLE) method in Scikit-learn for dimensionality-reduction

I'm using locally linear embedding (LLE) method in Scikit-learn for dimensionality reduction. The only example that I could find belong to the Scikit-learn documentation here and here, but I'm not sure how should I choose the parameters of the method. In particular, is there any relation between the dimension of data points or the number of samples and the number of neighbors (n_neighbors) and number of components (n_components)? All of the examples in Scikit-learn use n_components=2, is this always the case? Finally, is there any other parameter that is critical to tune, or I should use the default setting for the rest of parameters?

Upvotes: 0

Views: 1768

Answers (2)

Daniel
Daniel

Reputation: 2295

Is there any relation between the dimension of data points or the number of samples and the number of neighbors (n_neighbors) and number of components (n_components)?

Generally speaking, not related. n_neighbors is often decided by the distances among samples. Especially, if you know the classes of your samples, you'd better set n_neighbors a little bit greater than the number of samples in each class. While n_components, namely the reduced dimension size, is determined by the redundancy of data in original dimension. Based on the specific data distribution and your own demands, you can choose the proper space dimension for projection.

n_components=2 is to mapping the original high-dimensional space into a 2d-space. It is a special case, actually.

Is there any other parameter that is critical to tune, or I should use the default setting for the rest of parameters?

Here are several other parameters you should take care of.

  • reg for weight regularization, which is not used in the original LLE paper. If you don't want to use it, just simply set it to zero. However, the default value of reg is 1e-3, which is quite small.
  • eigen_solver. If your data size is small, it is recommended to use dense for accuracy. You can do more research on this.
  • max_iter. The default value of max_iter is only 100, which often causes the results not converged. If the results are not stable, please choose a larger interger.

Upvotes: 2

CodeSsscala
CodeSsscala

Reputation: 749

You can use GridSearch (Scikit-learn) to choose the best values for you.

Upvotes: 0

Related Questions