Reputation: 1
I'm trying to do the two-step cluster analysis known from SPSS in R since I don't have a license for SPSS. For this, I came across the package 'prcr'. There is a command to perform such an analysis. Strangely enough, I have to specify how many clusters I want to have, whereas the advantage of the two-step analysis is that it tries to determine the optimal number on the first step and then implements it in step 2 with k-means.
Does anyone of you know how I can implement this procedure in R?
Here you can find my code:
library(prcr)
daten.iris <- na.omit(iris)
daten.iris <- scale(iris[, -5])
twostep.res <- create_profiles_cluster(daten.iris, Sepal.Width, Sepal.Length, Petal.Width, Petal.Length, n_profiles = 3)
I appreciate any help.
I searched nearly everywhere but I can't find anything.
Upvotes: 0
Views: 842
Reputation: 136
daten.iris <-
na.omit(iris)
daten.iris <-
scale(iris[, -5])
# Convert to data frame
iris_data <-
as.data.frame(
daten.iris
)
# Calculate the total within sum of squares
# for 1 to 30 clusters
calc_data <-
data.frame(
n_clusters = seq_len(30),
tot_w_ss = numeric(30)
)
for (i in seq_len(30)){
temp_cluster <-
kmeans(
x = iris_data,
centers = i
)
calc_data[i, "tot_w_ss"] <-
temp_cluster$tot.withinss
}
# Visualize the result
# plot(calc_data)
# The optimum number of clusters is
# where the tot_w_ss graph makes an elbow.
iris_cluster <-
kmeans(
x = iris_data,
centers = 5
)
# Save the result in the original data frame
iris_data$cluster <-
iris_cluster$cluster
head(iris_data)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width cluster
#> 1 -0.8976739 1.01560199 -1.335752 -1.311052 2
#> 2 -1.1392005 -0.13153881 -1.335752 -1.311052 1
#> 3 -1.3807271 0.32731751 -1.392399 -1.311052 1
#> 4 -1.5014904 0.09788935 -1.279104 -1.311052 1
#> 5 -1.0184372 1.24503015 -1.335752 -1.311052 2
#> 6 -0.5353840 1.93331463 -1.165809 -1.048667 2
Created on 2022-11-08 with reprex v2.0.2
Upvotes: 1