Reputation: 537
If i try to run the code below to get the summary of my clustering results, I get the following error:
Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "table"
This code works if dat_ is a data frame, but if it is a table is get the above error message. Does anyone have a solution?
pam_fit <- pam(gower_dist, diss = TRUE, k) # performs cluster analysis
pam_results <- dat %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.))
pam_results$the_summary
Sample data set:
set.seed(1)
dat <- data.frame(ID = rep(sample(c("a","b","c","d","e","f","g"),10,replace = TRUE),70),
disease = sample(c("flu","headache","pain","inflammation","depression","infection","chest pain"),100,replace = TRUE))
dat <- unique(dat)
dat2 <- table(dat)
dat3 <- as.data.frame(dat)
Upvotes: 2
Views: 1126
Reputation: 46898
If you look at dat, every ID has multiple observations, and you are trying to partition ID into clusters, based on their disease column. So your cluster results should be as long as your id, and if you want to summarize your results, you do it per cluster.
To put the tables together, do:
library(cluster)
library(tidyverse)
pam_fit <- pam(daisy(dat2,"gower"), diss = TRUE, 2) # performs cluster analysis
pam_results <- as.data.frame.matrix(table(dat)) %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.),freq = colSums(.))
This gives the summary:
pam_results$freq
[[1]]
chest pain depression flu headache infection inflammation
4 5 4 3 5 3
pain cluster
5 5
[[2]]
chest pain depression flu headache infection inflammation
1 2 2 2 2 2
pain cluster
0 4
If you just need the frequency, you can simply do:
aggregate(as.data.frame.matrix(dat2[,-1]),list(cluster=pam_fit$clustering),sum)
cluster depression flu headache infection inflammation pain
1 1 5 4 3 5 3 5
2 2 2 2 2 2 2 0
Or a dplyr solution:
as.data.frame.matrix(dat2[,-1]) %>%
mutate(cluster=pam_fit$clustering) %>%
group_by(cluster) %>%
summarize_all(sum)
# A tibble: 2 x 7
cluster depression flu headache infection inflammation pain
<int> <int> <int> <int> <int> <int> <int>
1 1 5 4 3 5 3 5
2 2 2 2 2 2 2 0
Upvotes: 1