Reputation: 85
I have a data frame that is composed of 10 continuous variables:
dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))
Let's say I want to bin one of the columns by width, so the lowest 1/3 of values would be low, the middle 1/3 of values would be medium, etc.
break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]
dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))
How can I compute this bin for all the columns at the same time?
dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))
This is what I have, but it's not working. As it says
Error in cut.default(breaks = c(-Inf, break_point, break_point1, Inf), : argument "x" is missing, with no default
Upvotes: 0
Views: 532
Reputation: 2262
Try santoku::chop_equally()
:
library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3,
labels = c("low", "medium", "high"))
X1 X2 X3 X4 ......
[1,] "low" "high" "low" "medium" ......
[2,] "high" "high" "low" "low" ......
[3,] "high" "high" "high" "low" ......
......
Note that this creates separate breakpoints for each column, based on the quantiles of the column. If you always want the same breakpoints, just do
breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)
Also, you said you wanted to chop by width of intervals (equal width of each interval), but your example is chopping by quantiles (equal numbers of cells in each interval). If you want width of intervals, use santoku::chop_evenly()
.
Upvotes: 0
Reputation: 887741
We may need a lambda function
dat[] <- lapply(dat, function(x) cut(x,
breaks = c(-Inf, break_point, break_point1, Inf),
labels = c("low", "medium", "high")))
Or simply specify the parameters with its names, instead of cut(
dat[] <- lapply(dat, cut,
breaks = c(-Inf, break_point, break_point1, Inf),
labels = c("low", "medium", "high"))
Upvotes: 0