Reputation: 25
This is what I´ve done and it gives the result I want, but in an very inefficient way.
cut(df1$wage, breaks = c(-Inf, 20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000, Inf),
include.lowest=TRUE, dig.lab=10, labels = c("-20 000", "20 000-21 000", "21 000-22 000", "22 000-23 000", "23 000-24 000",
"24 000-25 000", "25 000-26 000", "26 000-27 000", "27 000-28 000", "28 000-29 000", "29 000-30 000", "30 000-"))
I want a lowest bin that include all values up to some specified value, in the example 20 000. And same with all values above 30 000.
And I would like to be able to vary the step length between the break points that in the example now is 1000, to say 500, without having to explicitly specify all the break points.
Optimally I would also like the labels to follow the break points i specify, which otherwise also becomes a very inefficient process
For the breaks-part I came close with breaks = (seq(from = 20000, to = 30000, by = 1000))
but couldn't figure out how to also include the bottom and top bins as in the example above
Upvotes: 2
Views: 70
Reputation: 388807
You can store the breaks in a vector and use it in breaks
and labels
breaks <- seq(from = 20000, to = 30000, by = 1000)
cut(df1$wage, breaks = c(-Inf, breaks Inf), include.lowest=TRUE, dig.lab=10,
labels = c(-20000, paste(head(breaks, -1), tail(breaks, -1), sep = "-"), "30000-"))
Upvotes: 1