John P. Newbury
John P. Newbury

Reputation: 45

getting error in [R] - missing value where TRUE/FALSE needed

I am trying to step through a vector to find the outliers using IQR to calculate a range. When I run this script looking for values to the right of the IQR I get results and when I run to the left I get the error: missing value where TRUE/FALSE needed. How can I scrub out the true and false in my dataset? here is my script:

data = c(100, 120, 121, 123, 125, 124, 123, 123, 123, 124, 125, 167, 180, 123, 156)
Q3 <- quantile(data, 0.75) ##gets the third quantile from the list of vectors
Q1 <- quantile(data, 0.25) ## gets the first quantile from the list of vectors
outliers_left <-(Q1-1.5*IQR(data)) 
outliers_right <-(Q3+1.5*IQR(data))
IQR <- IQR(data)
paste("the innner quantile range is", IQR)
Q1 # quantil at 0.25
Q3 # quantile at 0.75
# show the range of numbers we have
paste("your range is", outliers_left, "through", outliers_right, "to determine outliers")
# count ho many vectors there are and then we will pass this value into a loop to look for 
# anything above and below the Q1-Q3 values
vectorCount <- sum(!is.na(data))
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}

and the error I get is

[1] 167
[1] 180
[1] 156
Error in if (x > outliers_right) { : 
missing value where TRUE/FALSE needed

as you can see if you run this script, it is finding my 3 outliers on the right and also throws the error, but when I run this again on the left of my IQR, and I do have an outlier of 100 in the vector, I just get the error without other results being displayed. How can I fix this script? any help greatly appreciated. I've been scouring the web and my books for days on how to fix this.

Upvotes: 1

Views: 9865

Answers (2)

Chase
Chase

Reputation: 69171

As noted in the comments, the error is due to the way you've constructed your while loop. At the last iteration, i == 16 though there are only 15 elements to process. Changing from i <= vectorCount to i < vectorCount fixes the problem:

i <- 1
while( i < vectorCount ){
  i <- i + 1
  x <- data[i]
  # if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
  if(x > outliers_right) {print(x)}
}
#-----
[1] 167
[1] 180
[1] 156

However, this is really not how R works and you'll soon be frustrated at how long that code will take to run for any appreciable sized data. R is "vectorized" meaning that you can operate on all 15 elements of data at once. To print your outliers, I'd do this:

data[data > outliers_right]
#-----
[1] 167 180 156

Or to get all of them at once using the OR operator:

data[data< outliers_left | data > outliers_right]
#-----
[1] 100 167 180 156

For a little context, The above logical comparisons create a boolean value for each element of data and R only returns those that are TRUE. You can check this for yourself by typing:

data > outliers_right
#----
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE

The [ bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background ?"[".

Upvotes: 3

mnel
mnel

Reputation: 115392

The error message arises because you you let i <= vectorCount so i can equal vectorCount, and thus indexing i = i+1 from data will give NA, and the if statement will fail.

If you want to find the outliers based on the IQR, you can use findInterval

outliers <- data[findInterval(data, c(Q1,Q3)) != 1]

I would also stop using paste to create character messages to be printed, use message instead.

Upvotes: 1

Related Questions