Clustering of Count data

Question

I am currently trying to find clusters in a data set that looks like this:

         Dienstag 19 Mittwoch 20 Donnerstag 21 Freitag 22 Montag 25 Dienstag 26 Donnerstag 28
 [1,]           0           0             0          0         0           0            NA
 [2,]           0           0             0          0         0           0            NA
 [3,]           0           0             0          0         0           0            NA
 [4,]           0           0             0          0         1           0            NA
 [5,]           1           0             1          1         1           1            NA
 [6,]           0           0             0          0         0           0            NA
 [7,]           4           0             1          0         2           1            NA
 [8,]           0           1             2          1         0           2            NA
 [9,]           0           0             1          0         0           0            NA
[10,]           1           0             0          0         0           1             0
[11,]           2           0             1          0         0           5             0
[12,]           1           0             0          0         0           1             1
[13,]           0           1             0          0         0           0             0
[14,]           0           0             1          0         4           1             0

It corresponds at the counting of times a user used an application given the day and the hour.

I want to find pattern/clusters that relate the usage with the hour, but I don't know how to manage it. It would really be helpful if you could give me some suggestions about methods.

Tyler Rinker · Accepted Answer

There are statistical means at clustering as well but here's a visual approach. I was lazy and used libraries I am familiar with to accomplish this goal but it is likely accomplished more efficiently with some base tools.

## dat <-  read.table(text="         Dienstag.19 Mittwoch.20 Donnerstag.21 Freitag.22 Montag.25 Dienstag.26 Donnerstag.28
##  [1,]           0           0             0          0         0           0            NA
##  [2,]           0           0             0          0         0           0            NA
##  [3,]           0           0             0          0         0           0            NA
##  [4,]           0           0             0          0         1           0            NA
##  [5,]           1           0             1          1         1           1            NA
##  [6,]           0           0             0          0         0           0            NA
##  [7,]           4           0             1          0         2           1            NA
##  [8,]           0           1             2          1         0           2            NA
##  [9,]           0           0             1          0         0           0            NA
## [10,]           1           0             0          0         0           1             0
## [11,]           2           0             1          0         0           5             0
## [12,]           1           0             0          0         0           1             1
## [13,]           0           1             0          0         0           0             0
## [14,]           0           0             1          0         4           1             0", header=TRUE)


dat$hour <- factor(1:nrow(dat))
library(reshape2); library(qdap); library(ggplot2); library(plyr)
dat2 <- melt(dat)
dat2[, 2] <- beg2char(dat2[, 2], ".")
dat2 <- ddply(dat2, .(variable), transform,
   rescale = scale(value))

ggsave("heat.png")
ggplot(dat3, aes(variable, hour)) + geom_tile(aes(fill=rescale)) +
   scale_fill_gradient(low = "white", high = "red")

enter image description here

Clustering of Count data

Answers (2)

Related Questions