Counting values in data frame subject to conditions

Question

I have been searching around and I cannot figure out how to sumarise the data I have in my data frame (subject to some ranges). I know that it can be done when applying some combination of daaply/taaply or table but I haven't been able to get the exact result I was expecting.

Basically I want to turn this:

part_no val1 val2 val3
2 1 2 3 45.3
2 1 3 4 -12.3
3 1 3 4 99.3
3 1 5 2 -3.2
3 1 4 3 -55.3

Into this:

part_no val3_between0_50 val3_bw50_100 val3_bw-50_0 val3_bw-100_-50
2 1 0 0 1 0
3 0 1 0 1 1

This is dummy data, I got a lot more rows, but the idea is the same. I just want to count the number of values for a participant that meet certain condition.

If anyone could explain it sort of step by step, I would really appreciate it. I saw lots of different little posts around, but none do exactly this and my attempts only got me half way there. Like using table, etc.

IRTFM · Accepted Answer

Better solution that the one below (will not need the extra row used below although if you wanted to move the renaming code to this matrix result, you could):

xtabs(~part_no +cut(val4, breaks=c(-100, -50, 0, 50, 100) ), dat=dat)
 #-------------
       cut(val4, breaks = c(-100, -50, 0, 50, 100))
part_no (-100,-50] (-50,0] (0,50] (50,100]
      2          0       1      1        0
      3          1       1      0        1

First try: .... n to a slightly different problem and would be easy to adapt to your situation. The difficulty I ran into is that my solution requires the part_no to start with 1. You could assign row labels later I suppose. Or make 'part_no' a factor and use its numeric-mode value.

 dat <- read.table(text="part_no val1 val2 val3 val4
 1 1 2 3 -32
 2 1 2 3 45.3
 2 1 3 4 -12.3
 3 1 3 4 99.3
 3 1 5 2 -3.2
 3 1 4 3 -55.3
 ", head=T)

levs= 4; recs <- matrix( c(unique(dat$part_no), 
                           rep(0, levs*length(unique(dat$part_no))) ), 
                        nrow=length(unique(dat$part_no)) )
 recs[ cbind( dat$part_no, 
              1+ findInterval(dat$val4, c(-100, -50, 0, 50, 100) ) )] <- 1
 recs
#------------------------------------
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    0    0
[2,]    2    0    1    1    0
[3,]    3    1    1    0    1
#------------------------------------
 colnames(recs) <- c(names(dat)[1] , 
                     paste("val_btwn", 
                            c(-100, -50, 0, 50, 100)[1:4], 
                            c(-100, -50, 0, 50, 100)[2:5], 
                            sep="_") )
 recs
#------------------------------------
     part_no val_btwn_-100_-50 val_btwn_-50_0 val_btwn_0_50 val_btwn_50_100
[1,]       1                 0              1             0               0
[2,]       2                 0              1             1               0
[3,]       3                 1              1             0               1

And now that I think further I might use cut and xtabs next time. In fact it worked so well I am going to post it on top.

Counting values in data frame subject to conditions

Answers (1)

Related Questions