Sebastian Zeki
Sebastian Zeki

Reputation: 6874

Count the rows that contain a column where values over or under a certain amount in R

In a column I'd like to count the number of rows over or under a certain value. For example I have the following data:

 1  ch1 1.0e+07 0.431105424
 2  ch1 2.0e+07 1.498178994
 3  ch1 3.0e+07 0.862939286
 4  ch1 4.0e+07 0.735167840
 5  ch1 5.0e+07 0.973566728
 6  ch1 6.0e+07 1.772800459
 7  ch1 7.0e+07 0.298531476
 8  ch1 8.0e+07 0.728161969
 9  ch2 9.0e+07 0.768514386
 10 ch2 1.0e+08 1.481997291
 11 ch2 1.1e+08 0.000000000

I would like to know how many rows have a value over one for each ch value. Can anyone help

Upvotes: 0

Views: 112

Answers (3)

rnso
rnso

Reputation: 24583

Using split:

> split(ddf[,-2], ddf$bb)
$ch1
  aa    cc        dd
1  1 1e+07 0.4311054
2  2 2e+07 1.4981790
3  3 3e+07 0.8629393
4  4 4e+07 0.7351678
5  5 5e+07 0.9735667
6  6 6e+07 1.7728005
7  7 7e+07 0.2985315
8  8 8e+07 0.7281620

$ch2
   aa      cc        dd
9   9 9.0e+07 0.7685144
10 10 1.0e+08 1.4819973
11 11 1.1e+08 0.0000000

> ss = split(ddf[,-2], ddf$bb)
> 
> sapply(ss, function(x) apply(x,2,function(y) sum(y>1)  ) )
   ch1 ch2
aa   7   3
cc   8   3
dd   2   1

Using data.table:

> ddf
    aa  bb      cc        dd
 1:  1 ch1 1.0e+07 0.4311054
 2:  2 ch1 2.0e+07 1.4981790
 3:  3 ch1 3.0e+07 0.8629393
 4:  4 ch1 4.0e+07 0.7351678
 5:  5 ch1 5.0e+07 0.9735667
 6:  6 ch1 6.0e+07 1.7728005
 7:  7 ch1 7.0e+07 0.2985315
 8:  8 ch1 8.0e+07 0.7281620
 9:  9 ch2 9.0e+07 0.7685144
10: 10 ch2 1.0e+08 1.4819973
11: 11 ch2 1.1e+08 0.0000000
> 
> ddf[,apply(.SD, 2, function(x) sum(x>1)),by=bb]
    bb V1
1: ch1  7
2: ch1  8
3: ch1  2
4: ch2  3
5: ch2  3
6: ch2  1

Upvotes: 1

manugupt1
manugupt1

Reputation: 2437

It seems that the dataset you have given has 4 columns

 1  ch1 1.0e+07 0.431105424
 2  ch1 2.0e+07 1.498178994
 3  ch1 3.0e+07 0.862939286
 4  ch1 4.0e+07 0.735167840
 5  ch1 5.0e+07 0.973566728
 6  ch1 6.0e+07 1.772800459
 7  ch1 7.0e+07 0.298531476
 8  ch1 8.0e+07 0.728161969
 9  ch2 9.0e+07 0.768514386
 10 ch2 1.0e+08 1.481997291
 11 ch2 1.1e+08 0.000000000

Assuming that you know the names of the column and let us call the dataset as X and the column you want to compare to be col, I might simply do this

subset(X,X$col<val)

This will give you the subset

you can use

nrow(subset(X,X$col<val))

to get the number of rows in it

Upvotes: 1

cdeterman
cdeterman

Reputation: 19960

plyr::count is another option:

df <- read.table(header=F, text='
   ch1 1.0e+07 0.431105424
   ch1 2.0e+07 1.498178994
   ch1 3.0e+07 0.862939286
   ch1 4.0e+07 0.735167840
   ch1 5.0e+07 0.973566728
   ch1 6.0e+07 1.772800459
   ch1 7.0e+07 0.298531476
   ch1 8.0e+07 0.728161969
   ch2 9.0e+07 0.768514386
   ch2 1.0e+08 1.481997291
   ch2 1.1e+08 0.000000000
                 ')

require(plyr)
count(subset(df, V3>1), vars="V1")

> count(subset(df, V3>1), vars="V1")
   V1 freq
1 ch1    2
2 ch2    1

Upvotes: 1

Related Questions