Reputation: 1715
I have been searching around and I cannot figure out how to sumarise the data I have in my data frame (subject to some ranges). I know that it can be done when applying some combination of daaply/taaply or table but I haven't been able to get the exact result I was expecting.
Basically I want to turn this:
part_no val1 val2 val3
2 1 2 3 45.3
2 1 3 4 -12.3
3 1 3 4 99.3
3 1 5 2 -3.2
3 1 4 3 -55.3
Into this:
part_no val3_between0_50 val3_bw50_100 val3_bw-50_0 val3_bw-100_-50
2 1 0 0 1 0
3 0 1 0 1 1
This is dummy data, I got a lot more rows, but the idea is the same. I just want to count the number of values for a participant that meet certain condition.
If anyone could explain it sort of step by step, I would really appreciate it. I saw lots of different little posts around, but none do exactly this and my attempts only got me half way there. Like using table
, etc.
Upvotes: 0
Views: 440
Reputation: 263471
Better solution that the one below (will not need the extra row used below although if you wanted to move the renaming code to this matrix result, you could):
xtabs(~part_no +cut(val4, breaks=c(-100, -50, 0, 50, 100) ), dat=dat)
#-------------
cut(val4, breaks = c(-100, -50, 0, 50, 100))
part_no (-100,-50] (-50,0] (0,50] (50,100]
2 0 1 1 0
3 1 1 0 1
First try: .... n to a slightly different problem and would be easy to adapt to your situation. The difficulty I ran into is that my solution requires the part_no to start with 1. You could assign row labels later I suppose. Or make 'part_no' a factor and use its numeric-mode value.
dat <- read.table(text="part_no val1 val2 val3 val4
1 1 2 3 -32
2 1 2 3 45.3
2 1 3 4 -12.3
3 1 3 4 99.3
3 1 5 2 -3.2
3 1 4 3 -55.3
", head=T)
levs= 4; recs <- matrix( c(unique(dat$part_no),
rep(0, levs*length(unique(dat$part_no))) ),
nrow=length(unique(dat$part_no)) )
recs[ cbind( dat$part_no,
1+ findInterval(dat$val4, c(-100, -50, 0, 50, 100) ) )] <- 1
recs
#------------------------------------
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 0 0
[2,] 2 0 1 1 0
[3,] 3 1 1 0 1
#------------------------------------
colnames(recs) <- c(names(dat)[1] ,
paste("val_btwn",
c(-100, -50, 0, 50, 100)[1:4],
c(-100, -50, 0, 50, 100)[2:5],
sep="_") )
recs
#------------------------------------
part_no val_btwn_-100_-50 val_btwn_-50_0 val_btwn_0_50 val_btwn_50_100
[1,] 1 0 1 0 0
[2,] 2 0 1 1 0
[3,] 3 1 1 0 1
And now that I think further I might use cut
and xtabs
next time. In fact it worked so well I am going to post it on top.
Upvotes: 2