edesz
edesz

Reputation: 12406

R convert cluster summary object to dataframe

I am trying to extract the Validation Measures from an R clustering validation object created using clValid.

When I create the object and print the full summary, I use the following

library(clValid)

x <- clValid(iris[, -5], nClust=2:10,
        clMethods=c('hierarchical'), validation='internal')
summary(x)

The output of this is:

Clustering Methods:
 hierarchical 

Cluster sizes:
 2 3 4 5 6 7 8 9 10 

Validation Measures:
                                 2       3       4       5       6       7       8       9      10

hierarchical Connectivity   0.0000  4.4770  8.9929 15.4893 18.4183 24.8464 29.8425 36.8567 39.5607
             Dunn           0.3389  0.1378  0.1540  0.1540  0.1668  0.1624  0.1624  0.1915  0.1915
             Silhouette     0.6867  0.5542  0.4720  0.4307  0.3420  0.3707  0.3659  0.3167  0.3083

Optimal Scores:

             Score  Method       Clusters
Connectivity 0.0000 hierarchical 2       
Dunn         0.3389 hierarchical 2       
Silhouette   0.6867 hierarchical 2       

Required output

I am trying to get the Validation Measures as a dataframe like this:

                                2       3       4       5       6       7       8       9      10

hierarchical Connectivity   0.0000  4.4770  8.9929 15.4893 18.4183 24.8464 29.8425 36.8567 39.5607
             Dunn           0.3389  0.1378  0.1540  0.1540  0.1668  0.1624  0.1624  0.1915  0.1915
             Silhouette     0.6867  0.5542  0.4720  0.4307  0.3420  0.3707  0.3659  0.3167  0.3083

Attempt

When I use:

names(summary(x))
attributes(summary(x))

these both give

NULL

I can get the Optimal Scores using optimalScores(x), however, this does not work with validationMeasures(x).

Question

Is there a way to extract the Validation Measures as a data.frame from this summary object?

Upvotes: 1

Views: 527

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

First of all, you should always try

str(x)
Formal class 'clValid' [package "clValid"] with 14 slots
  ..@ clusterObjs:List of 1
  .. ..$ hierarchical:List of 7
  .. .. ..$ merge      : int [1:149, 1:2] -102 -8 -1 -10 -129 -11 -5 -20 -30 -58 ...
  .. .. ..$ height     : num [1:149] 0 0.1 0.1 0.1 0.1 ...
  .. .. ..$ order      : int [1:150] 42 15 16 33 34 37 21 32 44 24 ...
  .. .. ..$ labels     : NULL
  .. .. ..$ method     : chr "average"
  .. .. ..$ call       : language hclust(d = Dist, method = method)
  .. .. ..$ dist.method: chr "euclidean"
  .. .. ..- attr(*, "class")= chr "hclust"
  ..@ measures   : num [1:3, 1:9, 1] 0 0.339 0.687 4.477 0.138 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
  .. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
  .. .. ..$ : chr "hierarchical"
  ..@ measNames  : chr [1:3] "Connectivity" "Dunn" "Silhouette"
  ..@ clMethods  : chr "hierarchical"
  ..@ labels     : chr [1:150] "1" "2" "3" "4" ...
  ..@ nClust     : num [1:9] 2 3 4 5 6 7 8 9 10
  ..@ validation : chr "internal"
  ..@ metric     : chr "euclidean"
  ..@ method     : chr "average"
  ..@ neighbSize : num 10
  ..@ annotation : NULL
  ..@ GOcategory : chr "all"
  ..@ goTermFreq : num 0.05
  ..@ call       : language clValid(obj = iris[, -5], nClust = 2:10, clMethods = c("hierarchical"),      validation = "internal")

So we can see that this package uses and returns S4 objects, and that one of the slots, measures, seems to be the one you want.

x@measures[,,"hierarchical"]
                     2         3         4          5          6          7
Connectivity 0.0000000 4.4769841 8.9928571 15.4892857 18.4182540 24.8464286
Dunn         0.3389087 0.1378257 0.1540416  0.1540416  0.1668323  0.1624158
Silhouette   0.6867351 0.5541609 0.4719936  0.4306700  0.3419904  0.3707424
                      8          9         10
Connectivity 29.8424603 36.8567460 39.5607143
Dunn          0.1624158  0.1914854  0.1914854
Silhouette    0.3658753  0.3166807  0.3082851

Upvotes: 4

Related Questions