Noor Al-Bader
Noor Al-Bader

Reputation: 1

binned frequency from multiple variables in same column

Using R, I have melted my table into two columns, variable and value. There are multiple variables and values represent lengths. I would like to end up with three columns: variable, fixed pre-determined bins and frequency of each bin per variable.

Example Data:

original table/data frame:

Var Value
a   509
a   1000
b   251
b   330
...

new table/data frame:

Var Bin       Count
a   0-250     50
a   251-500   30
a   501-1000  1
b   0-250     80
b   251-500   55
b   501-1000  30
...

Upvotes: 0

Views: 935

Answers (1)

Uwe
Uwe

Reputation: 42544

To my surprise, I haven't found a duplicate question googling the question's title.

So, here are three possible solutions using cut() implemented in base R, dplyr, and data.table.

# pre-define break points of bins
breaks <- seq(0L, 1000L, by = 250L)

Note that cut() creates left open, right closed intervals by default, e.g,
250 < x <= 500.
For the first interval, the parameter include.lowest = TRUE forces a left closed interval as well, e.g.,
0 <= x <= 250
which seems to be in line with OP's expected output.

base R

with(DF, table(Var, cut(Value, breaks, include.lowest = TRUE)))
Var [0,250] (250,500] (500,750] (750,1e+03]
  a       0         0         1           1
  b       1         2         0           0

dpylr

library(dplyr)
DF %>% 
  group_by(Var, Bin = cut(Value, breaks, include.lowest = TRUE)) %>% 
  count()
# A tibble: 4 x 3
# Groups:   Var, Bin [4]
    Var         Bin     n
  <chr>      <fctr> <int>
1     a   (500,750]     1
2     a (750,1e+03]     1
3     b     [0,250]     1
4     b   (250,500]     2

data.table

library(data.table)
setDT(DF)[, .N, by = .(Var, Bin = cut(Value, breaks, include.lowest = TRUE))] 
   Var         Bin N
1:   a   (500,750] 1
2:   a (750,1e+03] 1
3:   b     [0,250] 1
4:   b   (250,500] 2

Data

library(data.table)
DF <- fread("Var Value
a   509
a   1000
b   0
b   251
b   330")

Upvotes: 1

Related Questions