Reputation: 1
Anybody knows how to recreate this data in R? Below is the cluster output that I want to have after doing factor analysis.
Cluster centers Value 1 Value 2 Value 3 Value 4
FACTOR1 -0.049 -1.481 0.505 0.651
FACTOR2 0.691 -0.161 -0.633 -0.547
FACTOR3 0.251 -0.265 0.611 -1.522
-------------------------------------------------------
No. of case 257 93 174 96
For my data I have 620 rows of observations and 20 columns of questions, 620x20. I first did factor analysis in R and factorized the 620 rows of observations into 3 factors producing the output as a 20x3 data frame shown below.
Matrix Factor 1 Factor 2 Factor 3
Q1 0.646 -0.095 0.041
Q2 0.630 0.047 0.124
Q3 ... ... ...
Q4 ... ... ...
...
Q20 0.419 0.181 0.337
Next I want to perform cluster analysis on 620 data, where the clusters consider the different factors scores as the output at the top. I am not sure how to do that in R.
Upvotes: 0
Views: 1356
Reputation: 5000
This is an example. I generated a 30x3 matrix, used kmeans
clustering specifying that 4 clusters are required. Note, you can use any other clustering algorithm. Then, I calculated the clusters centers (mean by cluster) using aggregate
. These centers can now be used to apply your classification in a new dataset by finding out, for each sample, what center that sample is closest to (e.g., using Euclidean distance).
set.seed(1); d <- matrix(rnorm(90), ncol=3)
kd <- kmeans(d, centers=4)
cluster <- kd$cluster
dd <- as.data.frame(cbind(d, cluster))
t(aggregate(dd, by=list(dd$cluster), FUN=mean))[c(1,5)*-1,]
[,1] [,2] [,3] [,4]
V1 0.8321043 -0.01501747 -0.09144934 -1.8916013
V2 0.0121109 -0.51743551 0.85714652 -0.5389448
V3 -0.4478400 0.17132066 0.99685057 -0.9206161
Upvotes: 1