Reputation: 605
I'm using locally linear embedding (LLE) method in Scikit-learn for dimensionality reduction. The only example that I could find belong to the Scikit-learn documentation here and here, but I'm not sure how should I choose the parameters of the method. In particular, is there any relation between the dimension of data points or the number of samples and the number of neighbors (n_neighbors
) and number of components (n_components
)? All of the examples in Scikit-learn use n_components=2, is this always the case? Finally, is there any other parameter that is critical to tune, or I should use the default setting for the rest of parameters?
Upvotes: 0
Views: 1768
Reputation: 2295
Is there any relation between the dimension of data points or the number of samples and the number of neighbors (
n_neighbors
) and number of components (n_components
)?
Generally speaking, not related. n_neighbors
is often decided by the distances among samples. Especially, if you know the classes of your samples, you'd better set n_neighbors
a little bit greater than the number of samples in each class. While n_components
, namely the reduced dimension size, is determined by the redundancy of data in original dimension. Based on the specific data distribution and your own demands, you can choose the proper space dimension for projection.
n_components=2
is to mapping the original high-dimensional space into a 2d-space. It is a special case, actually.
Is there any other parameter that is critical to tune, or I should use the default setting for the rest of parameters?
Here are several other parameters you should take care of.
reg
for weight regularization, which is not used in the original LLE paper. If you don't want to use it, just simply set it to zero. However, the default value of reg
is 1e-3
, which is quite small. eigen_solver
. If your data size is small, it is recommended to use dense
for accuracy. You can do more research on this. max_iter
. The default value of max_iter
is only 100, which often causes the results not converged. If the results are not stable, please choose a larger interger.Upvotes: 2
Reputation: 749
You can use GridSearch (Scikit-learn) to choose the best values for you.
Upvotes: 0