Pika
Pika

Reputation: 11

compute k-means after PCA

I'm new to R and I want to do a k-means clustering based on the results of pca. I did like this (taken Iris dataset as an example):


library(tidyverse)

library(FactoMineR)

library(factoextra)

df <- iris %>%
  select(- Species)

# compute PCA

res.pca <- PCA(df, 
               scale.unit = TRUE, 
               graph = FALSE)

summary(res.pca)

# k-means clustering

kc <- kmeans(res.pca, 3)

Then I got an error:Error in storage.mode(x) <- "double" : a list cannot converted automatically into a 'double'.

The output of the PCA are:

> res.pca
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 150 individuals, described by 4 variables
*The results are available in the following objects:

   name               description                          
1  "$eig"             "eigenvalues"                        
2  "$var"             "results for the variables"          
3  "$var$coord"       "coord. for the variables"           
4  "$var$cor"         "correlations variables - dimensions"
5  "$var$cos2"        "cos2 for the variables"             
6  "$var$contrib"     "contributions of the variables"     
7  "$ind"             "results for the individuals"        
8  "$ind$coord"       "coord. for the individuals"         
9  "$ind$cos2"        "cos2 for the individuals"           
10 "$ind$contrib"     "contributions of the individuals"   
11 "$call"            "summary statistics"                 
12 "$call$centre"     "mean of the variables"              
13 "$call$ecart.type" "standard error of the variables"    
14 "$call$row.w"      "weights for the individuals"        
15 "$call$col.w"      "weights for the variables"          
> 

> summary(res.pca)

Call:
PCA(X = df, scale.unit = TRUE, graph = FALSE) 


Eigenvalues
                       Dim.1   Dim.2   Dim.3   Dim.4
Variance               2.918   0.914   0.147   0.021
% of var.             72.962  22.851   3.669   0.518
Cumulative % of var.  72.962  95.813  99.482 100.000

Individuals (the 10 first)
                 Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr   cos2  
1            |  2.319 | -2.265  1.172  0.954 |  0.480  0.168  0.043 | -0.128  0.074  0.003 |
2            |  2.202 | -2.081  0.989  0.893 | -0.674  0.331  0.094 | -0.235  0.250  0.011 |
3            |  2.389 | -2.364  1.277  0.979 | -0.342  0.085  0.020 |  0.044  0.009  0.000 |
4            |  2.378 | -2.299  1.208  0.935 | -0.597  0.260  0.063 |  0.091  0.038  0.001 |
5            |  2.476 | -2.390  1.305  0.932 |  0.647  0.305  0.068 |  0.016  0.001  0.000 |
6            |  2.555 | -2.076  0.984  0.660 |  1.489  1.617  0.340 |  0.027  0.003  0.000 |
7            |  2.468 | -2.444  1.364  0.981 |  0.048  0.002  0.000 |  0.335  0.511  0.018 |
8            |  2.246 | -2.233  1.139  0.988 |  0.223  0.036  0.010 | -0.089  0.036  0.002 |
9            |  2.592 | -2.335  1.245  0.812 | -1.115  0.907  0.185 |  0.145  0.096  0.003 |
10           |  2.249 | -2.184  1.090  0.943 | -0.469  0.160  0.043 | -0.254  0.293  0.013 |

Variables
                Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr   cos2  
Sepal.Length |  0.890 27.151  0.792 |  0.361 14.244  0.130 | -0.276 51.778  0.076 |
Sepal.Width  | -0.460  7.255  0.212 |  0.883 85.247  0.779 |  0.094  5.972  0.009 |
Petal.Length |  0.992 33.688  0.983 |  0.023  0.060  0.001 |  0.054  2.020  0.003 |
Petal.Width  |  0.965 31.906  0.931 |  0.064  0.448  0.004 |  0.243 40.230  0.059 |

Could somebody help me with this problem ? What should I put instead of res.pca in the kmeans()? I don't know which part of the PCA results should I extract to use in the fonction kmeans()

Thank you in advance.

Upvotes: 1

Views: 3579

Answers (2)

StupidWolf
StupidWolf

Reputation: 46908

The principal component scores are stored under res.pca$ind$coord What you want to do kmeans on these:

So we can do:

kc <- kmeans(res.pca$ind$coord, 3)
plot(res.pca$ind$coord[,1:2],col=factor(kc$cluster))

enter image description here

Upvotes: 3

JMenezes
JMenezes

Reputation: 1059

It seems kmeans() expects a numeric matrix as input, however you are giving to it res.pca which is a list. Thus you get the error "cannot convert object of type list to double". "Double" is R's class to matrix or vectors of pure numbers.

I'm not sure about the what the PCA function outputs, so you must find a way to extract the PCA values from it, make it a matrix, and then run kmeans.

Hope it helps.

But for future reference, you can do a few things to make your questions easier to help:

  • Provide a reproducible example (a df with a few lines)
  • Translate error messages to english
  • Add the packages the function is from

Upvotes: 2

Related Questions