Is there a way to write these multiple break points (with equal step length) in R function cut more efficiently?

Question

This is what I´ve done and it gives the result I want, but in an very inefficient way.

cut(df1$wage, breaks = c(-Inf, 20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000, Inf), 
         include.lowest=TRUE, dig.lab=10, labels = c("-20 000", "20 000-21 000", "21 000-22 000", "22 000-23 000", "23 000-24 000",
                                                    "24 000-25 000", "25 000-26 000", "26 000-27 000", "27 000-28 000", "28 000-29 000", "29 000-30 000", "30 000-"))

I want a lowest bin that include all values up to some specified value, in the example 20 000. And same with all values above 30 000.

And I would like to be able to vary the step length between the break points that in the example now is 1000, to say 500, without having to explicitly specify all the break points.

Optimally I would also like the labels to follow the break points i specify, which otherwise also becomes a very inefficient process

For the breaks-part I came close with breaks = (seq(from = 20000, to = 30000, by = 1000))but couldn't figure out how to also include the bottom and top bins as in the example above

Ronak Shah · Accepted Answer

You can store the breaks in a vector and use it in breaks and labels

breaks <- seq(from = 20000, to = 30000, by = 1000)

cut(df1$wage, breaks = c(-Inf, breaks Inf), include.lowest=TRUE, dig.lab=10, 
 labels = c(-20000, paste(head(breaks, -1), tail(breaks, -1), sep = "-"), "30000-"))

Is there a way to write these multiple break points (with equal step length) in R function cut more efficiently?

Answers (1)

Related Questions