Reputation: 29
I am currently trying to find clusters in a data set that looks like this:
Dienstag 19 Mittwoch 20 Donnerstag 21 Freitag 22 Montag 25 Dienstag 26 Donnerstag 28
[1,] 0 0 0 0 0 0 NA
[2,] 0 0 0 0 0 0 NA
[3,] 0 0 0 0 0 0 NA
[4,] 0 0 0 0 1 0 NA
[5,] 1 0 1 1 1 1 NA
[6,] 0 0 0 0 0 0 NA
[7,] 4 0 1 0 2 1 NA
[8,] 0 1 2 1 0 2 NA
[9,] 0 0 1 0 0 0 NA
[10,] 1 0 0 0 0 1 0
[11,] 2 0 1 0 0 5 0
[12,] 1 0 0 0 0 1 1
[13,] 0 1 0 0 0 0 0
[14,] 0 0 1 0 4 1 0
It corresponds at the counting of times a user used an application given the day and the hour.
I want to find pattern/clusters that relate the usage with the hour, but I don't know how to manage it. It would really be helpful if you could give me some suggestions about methods.
Upvotes: 0
Views: 3029
Reputation: 77454
Most clustering algorithms will assume continuous data. While of course you can "cast" integers to double values, the results will no longer be as meaningful as they were for true continuous values.
I like Tylers visual approach. If there is a meaningful pattern, your brains visual cortex is probably the best tool to discover it.
Upvotes: 0
Reputation: 109874
There are statistical means at clustering as well but here's a visual approach. I was lazy and used libraries I am familiar with to accomplish this goal but it is likely accomplished more efficiently with some base tools.
## dat <- read.table(text=" Dienstag.19 Mittwoch.20 Donnerstag.21 Freitag.22 Montag.25 Dienstag.26 Donnerstag.28
## [1,] 0 0 0 0 0 0 NA
## [2,] 0 0 0 0 0 0 NA
## [3,] 0 0 0 0 0 0 NA
## [4,] 0 0 0 0 1 0 NA
## [5,] 1 0 1 1 1 1 NA
## [6,] 0 0 0 0 0 0 NA
## [7,] 4 0 1 0 2 1 NA
## [8,] 0 1 2 1 0 2 NA
## [9,] 0 0 1 0 0 0 NA
## [10,] 1 0 0 0 0 1 0
## [11,] 2 0 1 0 0 5 0
## [12,] 1 0 0 0 0 1 1
## [13,] 0 1 0 0 0 0 0
## [14,] 0 0 1 0 4 1 0", header=TRUE)
dat$hour <- factor(1:nrow(dat))
library(reshape2); library(qdap); library(ggplot2); library(plyr)
dat2 <- melt(dat)
dat2[, 2] <- beg2char(dat2[, 2], ".")
dat2 <- ddply(dat2, .(variable), transform,
rescale = scale(value))
ggsave("heat.png")
ggplot(dat3, aes(variable, hour)) + geom_tile(aes(fill=rescale)) +
scale_fill_gradient(low = "white", high = "red")
Upvotes: 2