Overtime4728
Overtime4728

Reputation: 144

Categorical variable to bin integer data in r

I'd like to create a categorical variable that assigns each value to a bin. So for data like:

x   <- floor(runif(50,0,40))

The categories will be:

g1 <- (x >= 0) & (x<= 10)
g2 <- (x >= 11) & (x<= 20)
g3 <- (x >= 21) & (x<= 30)
g4 <- (x>= 31)

The variable should then check x for the categories and assign each observation to a bin. Is there a way to do this in a single variable? Apologies if this has been asked before, I couldn't find anything on this specific case.

Upvotes: 0

Views: 449

Answers (1)

r2evans
r2evans

Reputation: 160407

set.seed(42)
x <- floor(runif(50,0,40))
head(x)
# [1] 36 37 11 33 25 20

head(cut(x, c(0, 10, 20, 30, Inf), include.lowest = TRUE))
# [1] (30,Inf] (30,Inf] (10,20]  (30,Inf] (20,30]  (10,20] 
# Levels: [0,10] (10,20] (20,30] (30,Inf]

head(cut(x, c(0, 10, 20, 30, Inf), labels = FALSE, include.lowest = TRUE))
# [1] 4 4 2 4 3 2

The default is to give you factors (first example), which is generally fine for most. The second is if you need an integer instead ... it has the same effect, though, in that all numbers that are between (say) 0 and 10 have the same value out of cut (a 1 in this case).

In your case, I think you want the "g1" labels, so instead of labels=FALSE, specify the labels manually (as @Ben just suggested):

head(cut(x, c(0, 10, 20, 30, Inf), labels = paste0("g", 1:4), include.lowest = TRUE))
# [1] g4 g4 g2 g4 g3 g2
# Levels: g1 g2 g3 g4

These are also factor (you can use as.character if you prefer).

Upvotes: 2

Related Questions