Åskan
Åskan

Reputation: 25

Is there a way to write these multiple break points (with equal step length) in R function cut more efficiently?

This is what I´ve done and it gives the result I want, but in an very inefficient way.

cut(df1$wage, breaks = c(-Inf, 20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000, Inf), 
         include.lowest=TRUE, dig.lab=10, labels = c("-20 000", "20 000-21 000", "21 000-22 000", "22 000-23 000", "23 000-24 000",
                                                    "24 000-25 000", "25 000-26 000", "26 000-27 000", "27 000-28 000", "28 000-29 000", "29 000-30 000", "30 000-"))

I want a lowest bin that include all values up to some specified value, in the example 20 000. And same with all values above 30 000.

And I would like to be able to vary the step length between the break points that in the example now is 1000, to say 500, without having to explicitly specify all the break points.

Optimally I would also like the labels to follow the break points i specify, which otherwise also becomes a very inefficient process

For the breaks-part I came close with breaks = (seq(from = 20000, to = 30000, by = 1000))but couldn't figure out how to also include the bottom and top bins as in the example above

Upvotes: 2

Views: 70

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

You can store the breaks in a vector and use it in breaks and labels

breaks <- seq(from = 20000, to = 30000, by = 1000)

cut(df1$wage, breaks = c(-Inf, breaks Inf), include.lowest=TRUE, dig.lab=10, 
 labels = c(-20000, paste(head(breaks, -1), tail(breaks, -1), sep = "-"), "30000-"))

Upvotes: 1

Related Questions