manofthousandnames
manofthousandnames

Reputation: 1

Perform a two-step cluster analysis in R

I'm trying to do the two-step cluster analysis known from SPSS in R since I don't have a license for SPSS. For this, I came across the package 'prcr'. There is a command to perform such an analysis. Strangely enough, I have to specify how many clusters I want to have, whereas the advantage of the two-step analysis is that it tries to determine the optimal number on the first step and then implements it in step 2 with k-means.

Does anyone of you know how I can implement this procedure in R?

Here you can find my code:

library(prcr)

daten.iris <- na.omit(iris)

daten.iris <- scale(iris[, -5])

twostep.res <- create_profiles_cluster(daten.iris, Sepal.Width, Sepal.Length, Petal.Width, Petal.Length, n_profiles = 3)

I appreciate any help.

I searched nearly everywhere but I can't find anything.

Upvotes: 0

Views: 842

Answers (1)

MarcusCodrescu
MarcusCodrescu

Reputation: 136

daten.iris <- 
  na.omit(iris)

daten.iris <- 
  scale(iris[, -5])

# Convert to data frame
iris_data <-
  as.data.frame(
    daten.iris
  )

# Calculate the total within sum of squares
# for 1 to 30 clusters
calc_data <-
  data.frame(
    n_clusters = seq_len(30),
    tot_w_ss = numeric(30)
  )

for (i in seq_len(30)){
  
  temp_cluster <-
    kmeans(
      x = iris_data,
      centers = i
    )
  
  calc_data[i, "tot_w_ss"] <-
    temp_cluster$tot.withinss
}

# Visualize the result
# plot(calc_data)

# The optimum number of clusters is
# where the tot_w_ss graph makes an elbow.
iris_cluster <-
  kmeans(
  x = iris_data,
  centers = 5
)

# Save the result in the original data frame
iris_data$cluster <-
  iris_cluster$cluster

head(iris_data)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width cluster
#> 1   -0.8976739  1.01560199    -1.335752   -1.311052       2
#> 2   -1.1392005 -0.13153881    -1.335752   -1.311052       1
#> 3   -1.3807271  0.32731751    -1.392399   -1.311052       1
#> 4   -1.5014904  0.09788935    -1.279104   -1.311052       1
#> 5   -1.0184372  1.24503015    -1.335752   -1.311052       2
#> 6   -0.5353840  1.93331463    -1.165809   -1.048667       2

Created on 2022-11-08 with reprex v2.0.2

Upvotes: 1

Related Questions