Reputation: 5234
I have the following code:
for (i in 1:5) {
print(i)
iris_cluster[i]<- kmeans(iris_data[1:4], i, nstart = 10)
}
kmeans() is this: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans
But I get the following error when I run it:
Error in `[<-.data.frame`(`*tmp*`, i, value = list(cluster = c(`1` = 1L, : replacement element 2 is a matrix/data frame of 1 row, need 150
I'm using the famous Iris dataset that comes with r.
I'm looking to create five dataframes:
iris_cluster1
iris_cluster2
iris_cluster3
iris_cluster4
iris_cluster5
Upvotes: 1
Views: 1152
Reputation: 887511
If the dataset is 'iris', we create a list
with lapply
lst1 <- lapply(1:5, function(i) kmeans(iris[1:4], i, nstart = 10))
names(lst1) <- paste0("iris_cluster", 1:5)
and use list2env
if we need separate objects in the global env (not recommended)
list2env(lst1, .GlobalEnv)
iris_cluster1
#K-means clustering with 1 clusters of sizes 150
#Cluster means:
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#1 5.843333 3.057333 3.758 1.199333
#Clustering vector:
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#[73] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#[145] 1 1 1 1 1 1
#Within cluster sum of squares by cluster:
#[1] 681.3706
# (between_SS / total_SS = 0.0 %)
#Available components:
#[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"
If we check the str
ucture of the output of one of the elements, it is a named list
of either vector or matrix
. The list
elements can be extracted with $
or [[
str(iris_cluster1)
#List of 9
# $ cluster : int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
# $ centers : num [1, 1:4] 5.84 3.06 3.76 1.2
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "1"
# .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
# $ totss : num 681
# $ withinss : num 681
# $ tot.withinss: num 681
# $ betweenss : num 6.82e-13
# $ size : int 150
# $ iter : int 1
# $ ifault : NULL
# - attr(*, "class")= chr "kmeans"
From a single element, the 'withinss' can be extracted as
iris_cluster1$withinss
#[1] 681.3706
From the list
, we can loop over the list
with lapply/sapply
. As the length
is different, either unlist
or stack
it to two-column data.frame
to return the cluster name as well. From here, we can extract the 'values' with either $
or [[
stack(lapply(lst1, `[[`, 'withinss'))[2:1]
# ind values
#1 iris_cluster1 681.370600
#2 iris_cluster2 28.552075
#3 iris_cluster2 123.795876
#4 iris_cluster3 23.879474
#5 iris_cluster3 15.151000
#6 iris_cluster3 39.820968
#7 iris_cluster4 18.703437
#8 iris_cluster4 15.151000
#9 iris_cluster4 9.749286
#10 iris_cluster4 13.624750
#11 iris_cluster5 15.151000
#12 iris_cluster5 9.228889
#13 iris_cluster5 4.655000
#14 iris_cluster5 5.462500
#15 iris_cluster5 11.963784
Upvotes: 1