Reputation: 23
I am trying to create a text-output of backup-durations sorted into 30-minute increment bins for 6 of our backup servers. An example of the input data (called newdata) is as follows:
backup_server client duration
1 bkp01 server_A 60
2 bkp01 server_A 34
3 bkp01 server_A 230
4 bkp02 server_A 14
5 bkp02 server_C 29
6 bkp02 server_C 62
Now I've been able to bin everything together with:
br.br <-seq(0,max(newdata$duration),by=30)
cbind(table(cut(newdata$duration,br.br,right=FALSE)))
Which provides this kind of output:
[,1]
[0,30) 3523
[30,60) 1394
[60,90) 230
[90,120) 35
[120,150) 10
[150,180) 0
[180,210) 3
What I'd like to see is something like this:
[,1] bkp01 bkp02
[0,30) 523 422
[30,60) 394 30
[60,90) 130 10
[90,120) 5 3
[120,150) 1 2
[150,180) 0 10
[180,210) 2 20
The closest I got was using the aggregate function but doesn't really do what I need.
> aggregate(newdata$Duration, by=list(newdata$TSM_server),FUN=mean)
Group.1 x
1 bkp01 31.13307
2 bkp02 16.58491
Upvotes: 2
Views: 519
Reputation: 263301
If this is not what you want (and by comparing @joran's solution to mine you should see that there is considerable ambiguity to be resolved regarding what summary measure is desired)....
aggregate(newdata$Duration,
by=list(dur.cut=cut(newdata$duration,br.br,right=FALSE) ,
server=newdata$TSM_server),
FUN=mean)
Then try this:
tapply( newdata$Duration,
INDEX=list(dur.cut=cut(newdata$duration,br.br,right=FALSE) ,
server=newdata$TSM_server),
FUN=mean)
Sometimes setting INDEX= interaction(var1, var2)
produces slightly different and at times more desirable results. ( In testing these I do observe that the column names are different than your example.)
aggregate(newdata$duration,
by=list(dur.cut=cut(newdata$duration,br.br,right=FALSE) ,
server=newdata$backup_server),
FUN=mean)
#------------
dur.cut server x
1 [30,60) bkp01 34.0
2 [60,90) bkp01 60.0
3 [0,30) bkp02 21.5
4 [60,90) bkp02 62.0
tapply( newdata$duration,
INDEX=list(dur.cut=cut(newdata$duration,br.br,right=FALSE) ,
server=newdata$backup_server),
FUN=mean)
#-------------
server
dur.cut bkp01 bkp02
[0,30) NA 21.5
[30,60) 34 NA
[60,90) 60 62.0
[90,120) NA NA
[120,150) NA NA
[150,180) NA NA
[180,210) NA NA
Upvotes: 1
Reputation: 173517
If I'm understanding you correctly, you're looking for counts for each backup server within your time bins. (i.e. I wasn't sure what was up with you attempt using mean
...)
If that's the case, here's one option using dcast
from the reshape2 package:
dat <- read.table(text = " backup_server client duration
1 bkp01 server_A 60
2 bkp01 server_A 34
3 bkp01 server_A 230
4 bkp02 server_A 14
5 bkp02 server_C 29
6 bkp02 server_C 62",sep = "",header = TRUE,row.names = 1)
#cut altered slightly to make more sense with your small example data
dat$dur <- cut(dat$duration,seq(0,max(dat$duration)+30,by = 30),right = FALSE)
dcast(dat,dur~backup_server,fun.aggregate = length,value.var = "dur")
dur bkp01 bkp02
1 [0,30) 0 2
2 [30,60) 1 0
3 [60,90) 1 1
4 [210,240) 1 0
Upvotes: 1