Reputation: 11
I'm new to R and I want to do a k-means clustering based on the results of pca. I did like this (taken Iris dataset as an example):
library(tidyverse)
library(FactoMineR)
library(factoextra)
df <- iris %>%
select(- Species)
# compute PCA
res.pca <- PCA(df,
scale.unit = TRUE,
graph = FALSE)
summary(res.pca)
# k-means clustering
kc <- kmeans(res.pca, 3)
Then I got an error:Error in storage.mode(x) <- "double" : a list cannot converted automatically into a 'double'.
The output of the PCA are:
> res.pca
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 150 individuals, described by 4 variables
*The results are available in the following objects:
name description
1 "$eig" "eigenvalues"
2 "$var" "results for the variables"
3 "$var$coord" "coord. for the variables"
4 "$var$cor" "correlations variables - dimensions"
5 "$var$cos2" "cos2 for the variables"
6 "$var$contrib" "contributions of the variables"
7 "$ind" "results for the individuals"
8 "$ind$coord" "coord. for the individuals"
9 "$ind$cos2" "cos2 for the individuals"
10 "$ind$contrib" "contributions of the individuals"
11 "$call" "summary statistics"
12 "$call$centre" "mean of the variables"
13 "$call$ecart.type" "standard error of the variables"
14 "$call$row.w" "weights for the individuals"
15 "$call$col.w" "weights for the variables"
>
> summary(res.pca)
Call:
PCA(X = df, scale.unit = TRUE, graph = FALSE)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4
Variance 2.918 0.914 0.147 0.021
% of var. 72.962 22.851 3.669 0.518
Cumulative % of var. 72.962 95.813 99.482 100.000
Individuals (the 10 first)
Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
1 | 2.319 | -2.265 1.172 0.954 | 0.480 0.168 0.043 | -0.128 0.074 0.003 |
2 | 2.202 | -2.081 0.989 0.893 | -0.674 0.331 0.094 | -0.235 0.250 0.011 |
3 | 2.389 | -2.364 1.277 0.979 | -0.342 0.085 0.020 | 0.044 0.009 0.000 |
4 | 2.378 | -2.299 1.208 0.935 | -0.597 0.260 0.063 | 0.091 0.038 0.001 |
5 | 2.476 | -2.390 1.305 0.932 | 0.647 0.305 0.068 | 0.016 0.001 0.000 |
6 | 2.555 | -2.076 0.984 0.660 | 1.489 1.617 0.340 | 0.027 0.003 0.000 |
7 | 2.468 | -2.444 1.364 0.981 | 0.048 0.002 0.000 | 0.335 0.511 0.018 |
8 | 2.246 | -2.233 1.139 0.988 | 0.223 0.036 0.010 | -0.089 0.036 0.002 |
9 | 2.592 | -2.335 1.245 0.812 | -1.115 0.907 0.185 | 0.145 0.096 0.003 |
10 | 2.249 | -2.184 1.090 0.943 | -0.469 0.160 0.043 | -0.254 0.293 0.013 |
Variables
Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
Sepal.Length | 0.890 27.151 0.792 | 0.361 14.244 0.130 | -0.276 51.778 0.076 |
Sepal.Width | -0.460 7.255 0.212 | 0.883 85.247 0.779 | 0.094 5.972 0.009 |
Petal.Length | 0.992 33.688 0.983 | 0.023 0.060 0.001 | 0.054 2.020 0.003 |
Petal.Width | 0.965 31.906 0.931 | 0.064 0.448 0.004 | 0.243 40.230 0.059 |
Could somebody help me with this problem ? What should I put instead of res.pca in the kmeans()? I don't know which part of the PCA results should I extract to use in the fonction kmeans()
Thank you in advance.
Upvotes: 1
Views: 3579
Reputation: 46908
The principal component scores are stored under res.pca$ind$coord
What you want to do kmeans on these:
So we can do:
kc <- kmeans(res.pca$ind$coord, 3)
plot(res.pca$ind$coord[,1:2],col=factor(kc$cluster))
Upvotes: 3
Reputation: 1059
It seems kmeans()
expects a numeric matrix as input, however you are giving to it res.pca
which is a list. Thus you get the error "cannot convert object of type list to double". "Double" is R's class to matrix or vectors of pure numbers.
I'm not sure about the what the PCA function outputs, so you must find a way to extract the PCA values from it, make it a matrix, and then run kmeans.
Hope it helps.
But for future reference, you can do a few things to make your questions easier to help:
Upvotes: 2