How to summarise pam clustering results in R?

Question

If i try to run the code below to get the summary of my clustering results, I get the following error:

Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "table"

This code works if dat_ is a data frame, but if it is a table is get the above error message. Does anyone have a solution?

    pam_fit <- pam(gower_dist, diss = TRUE, k)  # performs cluster analysis
    pam_results <- dat %>%
      mutate(cluster = pam_fit$clustering) %>%
      group_by(cluster) %>%
      do(the_summary = summary(.))
    pam_results$the_summary

Sample data set:

set.seed(1)
dat <- data.frame(ID = rep(sample(c("a","b","c","d","e","f","g"),10,replace = TRUE),70),
                 disease = sample(c("flu","headache","pain","inflammation","depression","infection","chest pain"),100,replace = TRUE))

dat <- unique(dat)

dat2 <- table(dat)
dat3 <- as.data.frame(dat)

StupidWolf · Accepted Answer

If you look at dat, every ID has multiple observations, and you are trying to partition ID into clusters, based on their disease column. So your cluster results should be as long as your id, and if you want to summarize your results, you do it per cluster.

To put the tables together, do:

library(cluster)
library(tidyverse)

pam_fit <- pam(daisy(dat2,"gower"), diss = TRUE, 2)  # performs cluster analysis

pam_results <- as.data.frame.matrix(table(dat)) %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.),freq = colSums(.))

This gives the summary:

pam_results$freq
[[1]]
  chest pain   depression          flu     headache    infection inflammation 
           4            5            4            3            5            3 
        pain      cluster 
           5            5 

[[2]]
  chest pain   depression          flu     headache    infection inflammation 
           1            2            2            2            2            2 
        pain      cluster 
           0            4

If you just need the frequency, you can simply do:

aggregate(as.data.frame.matrix(dat2[,-1]),list(cluster=pam_fit$clustering),sum)
      cluster depression flu headache infection inflammation pain
    1       1          5   4        3         5            3    5
    2       2          2   2        2         2            2    0

Or a dplyr solution:

as.data.frame.matrix(dat2[,-1]) %>% 
mutate(cluster=pam_fit$clustering) %>%
group_by(cluster) %>%
summarize_all(sum)

# A tibble: 2 x 7
  cluster depression   flu headache infection inflammation  pain
                             
1       1          5     4        3         5            3     5
2       2          2     2        2         2            2     0

How to summarise pam clustering results in R?

Answers (1)

Related Questions