Reputation: 1397
I have a data frame (df) like below:
Value <- c(1,1,0,2,1,3,4,0,0,1,2,0,3,0,4,5,2,3,0,6)
Sl <- c(1:20)
df <- data.frame(Sl,Value)
> df
Sl Value
1 1 1
2 2 1
3 3 0
4 4 2
5 5 1
6 6 3
7 7 4
8 8 0
9 9 0
10 10 1
11 11 2
12 12 0
13 13 3
14 14 0
15 15 4
16 16 5
17 17 2
18 18 3
19 19 0
20 20 6
I would like to create 4 bins out of df
and count the occurrences of Value=0
grouped by Sl
values in a separate data frame like below:
Bin Count
1 1
2 2
3 2
4 1
I was trying to use table
and cut
to create the desire data frame but its not clear how I'll specify df$Value
and the logic to find the 0
s here
df.4.cut <- as.data.frame(table(cut(df$Sl, breaks=seq(1,20, by=5))))
Upvotes: 6
Views: 9362
Reputation: 19867
Your question used table()
, but it lacked a second argument. It is needed to produce a contingency table. You can find the count of each bin with :
table(cut(df$Sl,4),df$Value)
0 1 2 3 4 5 6
(0.981,5.75] 1 3 1 0 0 0 0
(5.75,10.5] 2 1 0 1 1 0 0
(10.5,15.2] 2 0 1 1 1 0 0
(15.2,20] 1 0 1 1 0 1 1
And the count of Value == 0
for each bin :
table(cut(df$Sl,4),df$Value)[,"0"]
(0.981,5.75] (5.75,10.5] (10.5,15.2] (15.2,20]
1 2 2 1
Upvotes: 3
Reputation: 13570
A more convoluted way using sqldf
:
First we create a table defining the bins and ranges (min and max):
bins <- data.frame(id = c(1, 2, 3, 4),
bins = c("(0,5]", "(5,10]", "(10,15]", "(15,20]"),
min = c(0, 6, 11, 16),
max = c(5, 10, 15, 20))
id bins min max
1 1 (0,5] 0 5
2 2 (5,10] 6 10
3 3 (10,15] 11 15
4 4 (15,20] 16 20
Then we use the following query using both tables to bin each sl
into its respective group using BETWEEN
for those Value
equal to 0.
library(sqldf)
sqldf("SELECT bins, COUNT(Value) AS freq FROM df, bins
WHERE (((sl) BETWEEN [min] AND [max]) AND Value = 0)
GROUP BY bins
ORDER BY id")
Output:
bins freq
1 (0,5] 1
2 (5,10] 2
3 (10,15] 2
4 (15,20] 1
Another alternative to simplify the construction of bins suggested by mts using cut
, extracting the levels
of the factor:
bins <- data.frame(id = 1:4,
bins = levels(cut(Sl, breaks = seq(0, 20, 5))),
min = seq(1, 20, 5),
max = seq(5, 20, 5))
Upvotes: 2
Reputation: 887223
Or using data.table
, we convert the 'data.frame' to 'data.table' (setDT(df)
), using cut
output as the grouping variable, we get the sum
of 'Value' that are '0' (!Value
). By negating (!
), the column will be converted to logical vector i.e. TRUE
for 0 and FALSE
all other values not equal to 0.
library(data.table)
setDT(df)[,sum(!Value) , .(gr=cut(Sl,breaks=seq(0,20, 5)))]
# gr V1
#1: (0,5] 1
#2: (5,10] 2
#3: (10,15] 2
#4: (15,20] 1
Upvotes: 4
Reputation: 2190
Using your df
tapply(df$Value, cut(df$Sl, 4), function(x) sum(x==0))
gives
> tapply(df$Value, cut(df$Sl, 4), function(x) sum(x==0))
(0.981,5.75] (5.75,10.5] (10.5,15.2] (15.2,20]
1 2 2 1
In cut
you can specify the number of breaks or the breaks itself if you prefer and the logic is in the function definition in tapply
Upvotes: 4