Joep_S
Joep_S

Reputation: 537

How to summarise pam clustering results in R?

If i try to run the code below to get the summary of my clustering results, I get the following error:

Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "table"

This code works if dat_ is a data frame, but if it is a table is get the above error message. Does anyone have a solution?

    pam_fit <- pam(gower_dist, diss = TRUE, k)  # performs cluster analysis
    pam_results <- dat %>%
      mutate(cluster = pam_fit$clustering) %>%
      group_by(cluster) %>%
      do(the_summary = summary(.))
    pam_results$the_summary

Sample data set:

set.seed(1)
dat <- data.frame(ID = rep(sample(c("a","b","c","d","e","f","g"),10,replace = TRUE),70),
                 disease = sample(c("flu","headache","pain","inflammation","depression","infection","chest pain"),100,replace = TRUE))

dat <- unique(dat)

dat2 <- table(dat)
dat3 <- as.data.frame(dat)

Upvotes: 2

Views: 1126

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

If you look at dat, every ID has multiple observations, and you are trying to partition ID into clusters, based on their disease column. So your cluster results should be as long as your id, and if you want to summarize your results, you do it per cluster.

To put the tables together, do:

library(cluster)
library(tidyverse)

pam_fit <- pam(daisy(dat2,"gower"), diss = TRUE, 2)  # performs cluster analysis

pam_results <- as.data.frame.matrix(table(dat)) %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.),freq = colSums(.))

This gives the summary:

pam_results$freq
[[1]]
  chest pain   depression          flu     headache    infection inflammation 
           4            5            4            3            5            3 
        pain      cluster 
           5            5 

[[2]]
  chest pain   depression          flu     headache    infection inflammation 
           1            2            2            2            2            2 
        pain      cluster 
           0            4 

If you just need the frequency, you can simply do:

aggregate(as.data.frame.matrix(dat2[,-1]),list(cluster=pam_fit$clustering),sum)
      cluster depression flu headache infection inflammation pain
    1       1          5   4        3         5            3    5
    2       2          2   2        2         2            2    0

Or a dplyr solution:

as.data.frame.matrix(dat2[,-1]) %>% 
mutate(cluster=pam_fit$clustering) %>%
group_by(cluster) %>%
summarize_all(sum)

# A tibble: 2 x 7
  cluster depression   flu headache infection inflammation  pain
    <int>      <int> <int>    <int>     <int>        <int> <int>
1       1          5     4        3         5            3     5
2       2          2     2        2         2            2     0

Upvotes: 1

Related Questions