PineNuts0
PineNuts0

Reputation: 5234

R Programming: Loop through values to create kmeans() clusters of data with different k values

I have the following code:

for (i in 1:5) {

  print(i)

  iris_cluster[i]<- kmeans(iris_data[1:4], i, nstart = 10)
}

kmeans() is this: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans

But I get the following error when I run it:

Error in `[<-.data.frame`(`*tmp*`, i, value = list(cluster = c(`1` = 1L, : replacement element 2 is a matrix/data frame of 1 row, need 150

I'm using the famous Iris dataset that comes with r.

I'm looking to create five dataframes:

iris_cluster1
iris_cluster2
iris_cluster3
iris_cluster4
iris_cluster5

Upvotes: 1

Views: 1152

Answers (1)

akrun
akrun

Reputation: 887511

If the dataset is 'iris', we create a list with lapply

lst1 <- lapply(1:5, function(i) kmeans(iris[1:4], i, nstart = 10))
names(lst1) <- paste0("iris_cluster", 1:5)

and use list2env if we need separate objects in the global env (not recommended)

list2env(lst1, .GlobalEnv)
iris_cluster1
#K-means clustering with 1 clusters of sizes 150

#Cluster means:
#  Sepal.Length Sepal.Width Petal.Length Petal.Width
#1     5.843333    3.057333        3.758    1.199333

#Clustering vector:
#  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 #[73] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#[145] 1 1 1 1 1 1

#Within cluster sum of squares by cluster:
#[1] 681.3706
# (between_SS / total_SS =   0.0 %)

#Available components:

#[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"      

If we check the structure of the output of one of the elements, it is a named list of either vector or matrix. The list elements can be extracted with $ or [[

str(iris_cluster1)
#List of 9
# $ cluster     : int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
# $ centers     : num [1, 1:4] 5.84 3.06 3.76 1.2
#  ..- attr(*, "dimnames")=List of 2
#  .. ..$ : chr "1"
#  .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
# $ totss       : num 681
# $ withinss    : num 681
# $ tot.withinss: num 681
# $ betweenss   : num 6.82e-13
# $ size        : int 150
# $ iter        : int 1
# $ ifault      : NULL
# - attr(*, "class")= chr "kmeans"

From a single element, the 'withinss' can be extracted as

iris_cluster1$withinss
#[1] 681.3706

From the list, we can loop over the list with lapply/sapply. As the length is different, either unlist or stack it to two-column data.frame to return the cluster name as well. From here, we can extract the 'values' with either $ or [[

stack(lapply(lst1, `[[`, 'withinss'))[2:1]
#          ind     values
#1  iris_cluster1 681.370600
#2  iris_cluster2  28.552075
#3  iris_cluster2 123.795876
#4  iris_cluster3  23.879474
#5  iris_cluster3  15.151000
#6  iris_cluster3  39.820968
#7  iris_cluster4  18.703437
#8  iris_cluster4  15.151000
#9  iris_cluster4   9.749286
#10 iris_cluster4  13.624750
#11 iris_cluster5  15.151000
#12 iris_cluster5   9.228889
#13 iris_cluster5   4.655000
#14 iris_cluster5   5.462500
#15 iris_cluster5  11.963784

Upvotes: 1

Related Questions