Joarder Kamal
Joarder Kamal

Reputation: 1397

R - Counting the number of a specific value in bins

I have a data frame (df) like below:

Value <- c(1,1,0,2,1,3,4,0,0,1,2,0,3,0,4,5,2,3,0,6)
Sl <- c(1:20)
df <- data.frame(Sl,Value)    

> df
   Sl Value
1   1     1
2   2     1
3   3     0
4   4     2
5   5     1
6   6     3
7   7     4
8   8     0
9   9     0
10 10     1
11 11     2
12 12     0
13 13     3
14 14     0
15 15     4
16 16     5
17 17     2
18 18     3
19 19     0
20 20     6

I would like to create 4 bins out of df and count the occurrences of Value=0 grouped by Sl values in a separate data frame like below:

Bin Count
1   1
2   2
3   2
4   1

I was trying to use table and cut to create the desire data frame but its not clear how I'll specify df$Value and the logic to find the 0s here

df.4.cut <- as.data.frame(table(cut(df$Sl, breaks=seq(1,20, by=5))))

Upvotes: 6

Views: 9362

Answers (4)

scoa
scoa

Reputation: 19867

Your question used table(), but it lacked a second argument. It is needed to produce a contingency table. You can find the count of each bin with :

table(cut(df$Sl,4),df$Value)

             0 1 2 3 4 5 6
(0.981,5.75] 1 3 1 0 0 0 0
(5.75,10.5]  2 1 0 1 1 0 0
(10.5,15.2]  2 0 1 1 1 0 0
(15.2,20]    1 0 1 1 0 1 1

And the count of Value == 0 for each bin :

table(cut(df$Sl,4),df$Value)[,"0"] 

(0.981,5.75]  (5.75,10.5]  (10.5,15.2]    (15.2,20] 
           1            2            2            1 

Upvotes: 3

mpalanco
mpalanco

Reputation: 13570

A more convoluted way using sqldf :

First we create a table defining the bins and ranges (min and max):

bins <- data.frame(id = c(1, 2, 3, 4), 
                   bins = c("(0,5]", "(5,10]", "(10,15]", "(15,20]"),
                   min = c(0, 6, 11, 16), 
                   max = c(5, 10, 15, 20))

   id    bins min max
1  1   (0,5]   0   5
2  2  (5,10]   6  10
3  3 (10,15]  11  15
4  4 (15,20]  16  20

Then we use the following query using both tables to bin each sl into its respective group using BETWEEN for those Value equal to 0.

library(sqldf)
sqldf("SELECT bins, COUNT(Value) AS freq FROM df, bins
      WHERE (((sl) BETWEEN [min] AND [max]) AND Value = 0) 
      GROUP BY bins
      ORDER BY id")

Output:

     bins freq
1   (0,5]    1
2  (5,10]    2
3 (10,15]    2
4 (15,20]    1

Another alternative to simplify the construction of bins suggested by mts using cut, extracting the levels of the factor:

bins <- data.frame(id = 1:4, 
                   bins = levels(cut(Sl, breaks = seq(0, 20, 5))),
                   min = seq(1, 20, 5), 
                   max = seq(5, 20, 5))

Upvotes: 2

akrun
akrun

Reputation: 887223

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), using cut output as the grouping variable, we get the sum of 'Value' that are '0' (!Value). By negating (!), the column will be converted to logical vector i.e. TRUE for 0 and FALSE all other values not equal to 0.

library(data.table)
setDT(df)[,sum(!Value) , .(gr=cut(Sl,breaks=seq(0,20, 5)))]
#        gr V1
#1:   (0,5]  1
#2:  (5,10]  2
#3: (10,15]  2
#4: (15,20]  1

Upvotes: 4

mts
mts

Reputation: 2190

Using your df

tapply(df$Value, cut(df$Sl, 4), function(x) sum(x==0))

gives

> tapply(df$Value, cut(df$Sl, 4), function(x) sum(x==0))
(0.981,5.75]  (5.75,10.5]  (10.5,15.2]    (15.2,20] 
           1            2            2            1 

In cut you can specify the number of breaks or the breaks itself if you prefer and the logic is in the function definition in tapply

Upvotes: 4

Related Questions