Reputation: 1
Using R, I have melted my table into two columns, variable and value. There are multiple variables and values represent lengths. I would like to end up with three columns: variable, fixed pre-determined bins and frequency of each bin per variable.
Example Data:
original table/data frame:
Var Value
a 509
a 1000
b 251
b 330
...
new table/data frame:
Var Bin Count
a 0-250 50
a 251-500 30
a 501-1000 1
b 0-250 80
b 251-500 55
b 501-1000 30
...
Upvotes: 0
Views: 935
Reputation: 42544
To my surprise, I haven't found a duplicate question googling the question's title.
So, here are three possible solutions using cut()
implemented in base R, dplyr
, and data.table
.
# pre-define break points of bins
breaks <- seq(0L, 1000L, by = 250L)
Note that cut()
creates left open, right closed intervals by default, e.g,
250 < x <= 500
.
For the first interval, the parameter include.lowest = TRUE
forces a left closed interval as well, e.g.,
0 <= x <= 250
which seems to be in line with OP's expected output.
with(DF, table(Var, cut(Value, breaks, include.lowest = TRUE)))
Var [0,250] (250,500] (500,750] (750,1e+03] a 0 0 1 1 b 1 2 0 0
library(dplyr)
DF %>%
group_by(Var, Bin = cut(Value, breaks, include.lowest = TRUE)) %>%
count()
# A tibble: 4 x 3 # Groups: Var, Bin [4] Var Bin n <chr> <fctr> <int> 1 a (500,750] 1 2 a (750,1e+03] 1 3 b [0,250] 1 4 b (250,500] 2
library(data.table)
setDT(DF)[, .N, by = .(Var, Bin = cut(Value, breaks, include.lowest = TRUE))]
Var Bin N 1: a (500,750] 1 2: a (750,1e+03] 1 3: b [0,250] 1 4: b (250,500] 2
library(data.table)
DF <- fread("Var Value
a 509
a 1000
b 0
b 251
b 330")
Upvotes: 1