SavedByJESUS
SavedByJESUS

Reputation: 3314

Creating a variable from a condition with more than 2 arguments

Following is an easy version of what I'm trying to do. I have the following vector:

wage = 1:10 # Generate a sequence from 1 to 10

And I want to create another vector wage_level such that:

(i) wage_level is "low" if wage less than 5

(ii) wage_level is normal if wage is equal to 5

(iii) wage_level is high if wage is greater than 5

I know I can use nested ifelse statements to do it, however, as I pointed out earlier, this is but a simplified version of what I really want to do because I have about 15 alternatives.

Edit

The answer provided below makes use of the cut() function, which actually works well in many cases. However, it does not seem to "work" in my case. Following is the detailed explanation.

I was able to use the cut() function to create the wage_level vector:

wage = runif(10, 1, 10) # Randomly generate 10 values between 1 and 10

# Here I use the cut() function
wage_level = cut(wage,
                 breaks = c(1, 4, 6, 10),
                 labels = c("low", "normal", "high"),
                 include.lowest = TRUE)
> wage
[1] 5.522422 4.793292 8.161671 5.480415 1.396909 3.403013 4.940242 7.762142 6.364159 4.603998

> wage_level
[1] normal normal high   normal low    low    normal high   high   normal
Levels: low normal high

Now, let's suppose I want to use the wage_level vector to create another vector (the rating vector) using the cut() function. The condition to create the rating vector is as follows:

(i) rating is "1" if wage_level less than "low"

(ii) rating is 2 if wage_level is equal to "normal"

(iii) rating is 3 if wage_level is greater than "high

My problem is that using the cut() function will not make the rating vector a numeric vector will the values of my choice. The following code does not work:

rating = cut(as.numeric(wage_level),
                 breaks = c(0, 1, 2, 3),
                 labels = c(1.2, 6.5, 8.9),
                 include.lowest = TRUE)

> as.numeric(rating)
 [1] 2 2 3 2 1 1 2 3 3 2

I mainly have two problems here:

(i) I would have preferred a way to use the actual strings (i.e. "low", "normal" and "high") instead of the labels indexes

(ii) The values in the rating vector have nothing to do with the values I specified.

Any other method to achieve the desired result?

Thank you very much for your help :)

Upvotes: 1

Views: 480

Answers (1)

ndoogan
ndoogan

Reputation: 1925

wage<-1:10
cut(wage,breaks=c(0,4,5,10),include.lowest=T,labels=c("low","normal","high"))
# [1] low    low    low    low    normal high   high   high   high   high  
#Levels: low normal high

What if the vector isn't ordered? No difference:

wage <- runif(10,1,10)
wage
# [1] 8.535146 4.964819 7.228050 9.150132 6.369952 8.451137 8.022293 7.621226
# [9] 1.070368 5.931904

cut(wage,breaks=c(0,4,5,10),include.lowest=T,labels=c("low","normal","high"))
# [1] high   normal high   high   high   high   high   high   low    high  

Though, notice that the normal factor is applied to values between 4 and 5. If you're really working with reals, then looking for exactly 5 might be an odd choice.

Upvotes: 4

Related Questions