Reputation: 133
I want to categorize a vector of values between 0 and 1. Values below .001, and values higher than .10 or of no interest. Therefore I want values in these ranges to be NA.
When I run the code below I get a warning:
Error in if (x[i] > 0.001 & x[i] <= 0.01) x[i] = 0.01 : missing value where TRUE/FALSE needed
How do I fix my code?
for (i in 1:length(x))
{
if (x[i] <= .001)
x[i] = NA
if (x[i] > .001 & x[i] <= .01)
x[i] = .01
if (x[i] > .01 & x[i] <= .02)
x[i] = .02
if (x[i] > .02 & x[i] <= .03)
x[i] = .03
if (x[i] > .03 & x[i] <= .04)
x[i] = .04
if (x[i] > .04 & x[i] <= .05)
x[i] = .05
if (x[i] > .05 & x[i] <= .06)
x[i] = .06
if (x[i] > .06 & x[i] <= .07)
x[i] = .07
if (x[i] > .07 & x[i] <= .08)
x[i] = .08
if (x[i] > .08 & x[i] <= .09)
x[i] = .09
if (x[i] > .09 & x[i] <= .10)
x[i] = .10
if (x[i] > .10 & x[i] <= 1)
x[i] = NA
}
Upvotes: 4
Views: 30876
Reputation: 193687
First, some test data:
set.seed(1); x = dnorm(rnorm(100))/(sample(1:100, 100, replace=TRUE))
Subsetting can be done in the following way:
x[x < .001] = NA
x[x > .1] = NA
Or, you can combine it in one statement:
x[x < .001 | x > .1] = NA
You're running into problems if it does find an NA
in there, so remove them from your for
loop, but index them before you run the loop so you can remove them later.
temp = which(x < .001 | x > .1) # Index the values you want to set as NA
Remove the following conditions from your for
loop:
if (x[i] > .10 & x[i] <= 1)
x[i] = NA
if (x[i] <= .001)
x[i] = NA
Run your for
loop, and then use temp
to set the values to NA
that should be NA
.
x[temp] = NA
Hope this helps!
x[x < .001 | x > .1] = NA
out <- ceiling(x*100)/100
Pretty much the same as AKE's suggestion using floor.
This should get you the same results as your loop.
Upvotes: 7
Reputation: 263481
The findInterval
function can be used productively in this very structured choice problem. It produces an index that can "lookup" or select the desired result for values in particular intervals:
x <- rnorm(1000)
x <- c(NA, seq(0.1, 1, by=0.1), NA)[
1+ findInterval(x, c(0.001, seq(0.1, 1, by=0.1)) ,rightmost.closed=TRUE) ]
#---------------
table(x)
x
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
34 38 48 44 29 30 26 20 17 31
> table(is.na(x))
FALSE TRUE
317 683
The rightmost.closed argument shift the usual leftmost closure of intervals, although in this example it didn't matter, since none of the random draws were on boundaries. It's generally not a good idea to destroy your input data, though. I hope x
was a copy of your original data. The other way of doing this would be to omit the 1+
and instead use intervals in the findInterval
second argument like c(-Inf, 0.001, seq(0.1, 1, by=0.1) , Inf)
Upvotes: 0
Reputation: 2330
Instead of using an explicit for
loop, you should try to use a vectorized function, such as the very handy ifelse
. Here is how to recode the NAs
in your example:
> x <- ifelse(x <= 0.001 | x > 0.1, NA, x)
To recode the other values, you could try some "clever" use of cut
:
> x <- (cut(x, breaks=seq(0.01, 0.09, 0.01), labels=FALSE) / 100) + 0.01
though there are likely better (and more transparent) ways. The reason for avoiding explicit for
loops in R is that they are very inefficient when compared to vectorized alternatives. The R Inferno provides a good discussion of this and other R tricks and tips.
Upvotes: 1
Reputation: 6361
While your solution works conceptually, it is "brute force", which means a lot of typing, won't scale to a slightly different problem, and is also slow to execute.
R allows working with vectors so if your logic works for an arbitrary number between 0 and 1, then it should work with a vector of values between 0 and 1.
Try something like the following:
y=((floor(100*x)) # all values < 0.01 map to 0
if y>10 then y=0 # force values > 0.1 to 0
if y>0, then (y+1)/100 # for non-zero values, map to the upper interval, then return to original scale.
The first line squashes all values less than 0.01 to 0. The second line squashes all values greater than 0.1 to 0. The third line lifts the remaining non zero values to the top value of the range (round up) and returns them to the original scale.
Upvotes: 0