Reputation: 144
I'd like to create a categorical variable that assigns each value to a bin. So for data like:
x <- floor(runif(50,0,40))
The categories will be:
g1 <- (x >= 0) & (x<= 10)
g2 <- (x >= 11) & (x<= 20)
g3 <- (x >= 21) & (x<= 30)
g4 <- (x>= 31)
The variable should then check x for the categories and assign each observation to a bin. Is there a way to do this in a single variable? Apologies if this has been asked before, I couldn't find anything on this specific case.
Upvotes: 0
Views: 449
Reputation: 160407
set.seed(42)
x <- floor(runif(50,0,40))
head(x)
# [1] 36 37 11 33 25 20
head(cut(x, c(0, 10, 20, 30, Inf), include.lowest = TRUE))
# [1] (30,Inf] (30,Inf] (10,20] (30,Inf] (20,30] (10,20]
# Levels: [0,10] (10,20] (20,30] (30,Inf]
head(cut(x, c(0, 10, 20, 30, Inf), labels = FALSE, include.lowest = TRUE))
# [1] 4 4 2 4 3 2
The default is to give you factor
s (first example), which is generally fine for most. The second is if you need an integer
instead ... it has the same effect, though, in that all numbers that are between (say) 0 and 10 have the same value out of cut
(a 1
in this case).
In your case, I think you want the "g1"
labels, so instead of labels=FALSE
, specify the labels manually (as @Ben just suggested):
head(cut(x, c(0, 10, 20, 30, Inf), labels = paste0("g", 1:4), include.lowest = TRUE))
# [1] g4 g4 g2 g4 g3 g2
# Levels: g1 g2 g3 g4
These are also factor
(you can use as.character
if you prefer).
Upvotes: 2