R Univariate Clustering by Group

Question

I am trying to find a method to cluster univariate data by group. For example, in the data below I have two failure codes (a and b) and 6 data points for each grouping. In the plot you can see that for each failure code there are 2 distinct clusters for failure time. Manually this isn't bad, but I can't figure out how to do this with a larger data set (~100K rows and ~30 codes). I would like for the end result to give me the medoid for each cluster and the count of codes in that cluster.

library(ggplot2)
failure <- rep(c("a","b"),each=6)
ttf <- c(1,1.5,2,5,5.5,6,8,8.5,9,14,14.5,15)
data <- data.frame(failure,ttf)
qplot(failure, ttf)
results <- data.frame(failure = c("a","b"), m1 = c(1.5,8.5), m2 = c(5.5,14.5))

I would like for the end result to give me something like the table below.

failure m1   m1count  m2    m2count
a       1.5  3        5.5   3
b       8.5  3        14.5  3

emilliman5 · Accepted Answer

This is will do what you want, assuming only two clusters per failure group, though you could change it in the tapply it would apply to all failure groups.

res2 <- tapply(data$ttf, INDEX = data$failure, function(x) kmeans(x,2))    
res3 <- lapply(names(res2), function(x) data.frame(failure=x, Centers=res2[[x]]$centers, Size=res2[[x]]$size))     
res3 <- do.call(rbind, res3)

res3
   failure Centers Size
1        a     5.5    3
2        a     1.5    3
11       b    14.5    3
21       b     8.5    3

R Univariate Clustering by Group

Answers (1)

Related Questions