RoyBatty279
RoyBatty279

Reputation: 85

Apply cut function to all the columns of a dataframe

I have a data frame that is composed of 10 continuous variables:

dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))

Let's say I want to bin one of the columns by width, so the lowest 1/3 of values would be low, the middle 1/3 of values would be medium, etc.

break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]

dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))

How can I compute this bin for all the columns at the same time?

dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))

This is what I have, but it's not working. As it says

Error in cut.default(breaks = c(-Inf, break_point, break_point1, Inf), : argument "x" is missing, with no default

Upvotes: 0

Views: 532

Answers (2)

dash2
dash2

Reputation: 2262

Try santoku::chop_equally():

library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3, 
        labels = c("low", "medium", "high"))

      X1       X2       X3       X4       ......
 [1,] "low"    "high"   "low"    "medium" ......
 [2,] "high"   "high"   "low"    "low"    ......
 [3,] "high"   "high"   "high"   "low"    ......
 ......

Note that this creates separate breakpoints for each column, based on the quantiles of the column. If you always want the same breakpoints, just do

breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)

Also, you said you wanted to chop by width of intervals (equal width of each interval), but your example is chopping by quantiles (equal numbers of cells in each interval). If you want width of intervals, use santoku::chop_evenly().

Upvotes: 0

akrun
akrun

Reputation: 887741

We may need a lambda function

dat[] <- lapply(dat, function(x) cut(x, 
    breaks = c(-Inf, break_point, break_point1, Inf), 
     labels = c("low", "medium", "high")))

Or simply specify the parameters with its names, instead of cut(

dat[] <- lapply(dat, cut, 
      breaks = c(-Inf, break_point, break_point1, Inf),
      labels = c("low", "medium", "high"))

Upvotes: 0

Related Questions