Reputation: 6874
In a column I'd like to count the number of rows over or under a certain value. For example I have the following data:
1 ch1 1.0e+07 0.431105424
2 ch1 2.0e+07 1.498178994
3 ch1 3.0e+07 0.862939286
4 ch1 4.0e+07 0.735167840
5 ch1 5.0e+07 0.973566728
6 ch1 6.0e+07 1.772800459
7 ch1 7.0e+07 0.298531476
8 ch1 8.0e+07 0.728161969
9 ch2 9.0e+07 0.768514386
10 ch2 1.0e+08 1.481997291
11 ch2 1.1e+08 0.000000000
I would like to know how many rows have a value over one for each ch value. Can anyone help
Upvotes: 0
Views: 112
Reputation: 24583
Using split:
> split(ddf[,-2], ddf$bb)
$ch1
aa cc dd
1 1 1e+07 0.4311054
2 2 2e+07 1.4981790
3 3 3e+07 0.8629393
4 4 4e+07 0.7351678
5 5 5e+07 0.9735667
6 6 6e+07 1.7728005
7 7 7e+07 0.2985315
8 8 8e+07 0.7281620
$ch2
aa cc dd
9 9 9.0e+07 0.7685144
10 10 1.0e+08 1.4819973
11 11 1.1e+08 0.0000000
> ss = split(ddf[,-2], ddf$bb)
>
> sapply(ss, function(x) apply(x,2,function(y) sum(y>1) ) )
ch1 ch2
aa 7 3
cc 8 3
dd 2 1
Using data.table:
> ddf
aa bb cc dd
1: 1 ch1 1.0e+07 0.4311054
2: 2 ch1 2.0e+07 1.4981790
3: 3 ch1 3.0e+07 0.8629393
4: 4 ch1 4.0e+07 0.7351678
5: 5 ch1 5.0e+07 0.9735667
6: 6 ch1 6.0e+07 1.7728005
7: 7 ch1 7.0e+07 0.2985315
8: 8 ch1 8.0e+07 0.7281620
9: 9 ch2 9.0e+07 0.7685144
10: 10 ch2 1.0e+08 1.4819973
11: 11 ch2 1.1e+08 0.0000000
>
> ddf[,apply(.SD, 2, function(x) sum(x>1)),by=bb]
bb V1
1: ch1 7
2: ch1 8
3: ch1 2
4: ch2 3
5: ch2 3
6: ch2 1
Upvotes: 1
Reputation: 2437
It seems that the dataset you have given has 4 columns
1 ch1 1.0e+07 0.431105424
2 ch1 2.0e+07 1.498178994
3 ch1 3.0e+07 0.862939286
4 ch1 4.0e+07 0.735167840
5 ch1 5.0e+07 0.973566728
6 ch1 6.0e+07 1.772800459
7 ch1 7.0e+07 0.298531476
8 ch1 8.0e+07 0.728161969
9 ch2 9.0e+07 0.768514386
10 ch2 1.0e+08 1.481997291
11 ch2 1.1e+08 0.000000000
Assuming that you know the names of the column and let us call the dataset as X and the column you want to compare to be col, I might simply do this
subset(X,X$col<val)
This will give you the subset
you can use
nrow(subset(X,X$col<val))
to get the number of rows in it
Upvotes: 1
Reputation: 19960
plyr::count
is another option:
df <- read.table(header=F, text='
ch1 1.0e+07 0.431105424
ch1 2.0e+07 1.498178994
ch1 3.0e+07 0.862939286
ch1 4.0e+07 0.735167840
ch1 5.0e+07 0.973566728
ch1 6.0e+07 1.772800459
ch1 7.0e+07 0.298531476
ch1 8.0e+07 0.728161969
ch2 9.0e+07 0.768514386
ch2 1.0e+08 1.481997291
ch2 1.1e+08 0.000000000
')
require(plyr)
count(subset(df, V3>1), vars="V1")
> count(subset(df, V3>1), vars="V1")
V1 freq
1 ch1 2
2 ch2 1
Upvotes: 1