Reputation: 3314
Following is an easy version of what I'm trying to do. I have the following vector:
wage = 1:10 # Generate a sequence from 1 to 10
And I want to create another vector wage_level
such that:
(i) wage_level
is "low"
if wage
less than 5
(ii) wage_level
is normal
if wage
is equal to 5
(iii) wage_level
is high
if wage
is greater than 5
I know I can use nested ifelse
statements to do it, however, as I pointed out earlier, this is but a simplified version of what I really want to do because I have about 15 alternatives.
Edit
The answer provided below makes use of the cut()
function, which actually works well in many cases. However, it does not seem to "work" in my case. Following is the detailed explanation.
I was able to use the cut()
function to create the wage_level
vector:
wage = runif(10, 1, 10) # Randomly generate 10 values between 1 and 10
# Here I use the cut() function
wage_level = cut(wage,
breaks = c(1, 4, 6, 10),
labels = c("low", "normal", "high"),
include.lowest = TRUE)
> wage
[1] 5.522422 4.793292 8.161671 5.480415 1.396909 3.403013 4.940242 7.762142 6.364159 4.603998
> wage_level
[1] normal normal high normal low low normal high high normal
Levels: low normal high
Now, let's suppose I want to use the wage_level
vector to create another vector (the rating
vector) using the cut()
function. The condition to create the rating
vector is as follows:
(i) rating
is "1"
if wage_level
less than "low"
(ii) rating
is 2
if wage_level
is equal to "normal"
(iii) rating
is 3
if wage_level
is greater than "high
My problem is that using the cut()
function will not make the rating
vector a numeric
vector will the values of my choice. The following code does not work:
rating = cut(as.numeric(wage_level),
breaks = c(0, 1, 2, 3),
labels = c(1.2, 6.5, 8.9),
include.lowest = TRUE)
> as.numeric(rating)
[1] 2 2 3 2 1 1 2 3 3 2
I mainly have two problems here:
(i) I would have preferred a way to use the actual strings (i.e. "low", "normal" and "high") instead of the labels indexes
(ii) The values in the rating
vector have nothing to do with the values I specified.
Any other method to achieve the desired result?
Thank you very much for your help :)
Upvotes: 1
Views: 480
Reputation: 1925
wage<-1:10
cut(wage,breaks=c(0,4,5,10),include.lowest=T,labels=c("low","normal","high"))
# [1] low low low low normal high high high high high
#Levels: low normal high
What if the vector isn't ordered? No difference:
wage <- runif(10,1,10)
wage
# [1] 8.535146 4.964819 7.228050 9.150132 6.369952 8.451137 8.022293 7.621226
# [9] 1.070368 5.931904
cut(wage,breaks=c(0,4,5,10),include.lowest=T,labels=c("low","normal","high"))
# [1] high normal high high high high high high low high
Though, notice that the normal
factor is applied to values between 4 and 5. If you're really working with reals, then looking for exactly 5 might be an odd choice.
Upvotes: 4