petemq
petemq

Reputation: 23

if-statement in R: "missing value" error despite existing value

An if-statement returns an "missing Value"-error when there is a perfectly healthy value.

I wanted to write a simple script to delete rows in a dataset if one of their entries contains a certain tag. I assign an indicator variable in a new column (containsMR) and then iterate over the rows using a for-loop. If the indicator is TRUE, the row should be removed.

The indicators get assigned correctly, so far, so good. The interesting part: In the loop's if-statement seems to have trouble reading the values, because it returns "Error in if (data$containsMR[i]) { : missing value where TRUE/FALSE needed".

Given the correct (and complete) assignment of indicator variables, this surprises me. What is even more weird: Some, but not all the rows with a positive indicator are removed (checked with printouts and table(data$containsMR) ).

And now the really weird stuff: if I run the same loop one more time, it removes the rest of the columns (as it should), but returns the same error. So, theoretically, I could just run the loop twice, ignore the errors and walk away with the result I wanted. That's just really not the point of what I'm doing.

Bugfixes tried: - changed for- to while-loop - changed indicators (and if-statement) to integer (0,1) - ran the script in RStudio and R console - changed variable names, included/excluded definitions (e.g. adding the proxy variable row_number instead of calling it in one line.

# Script to delete all rows containing "MR" in column "EXAM_CODE"

# import file
data <- read.csv("C:\\ScriptingTest\\ablations 0114.csv")

# add indicator column
for (i in 1:nrow(data)){
    data$containsMR[i] <- ifelse(grepl("MR", toString(data$EXAM_CODE[i])), TRUE, FALSE)
}

# remove rows with positive indicator
row_number <- nrow(data)
for (i in 1:row_number){
    if (data$containsMR[i]){
        data <- data[-c(i),]
    }
}

# export csv
write.csv(data, "C:\\ScriptingTest\\export.csv")

Upvotes: 0

Views: 142

Answers (2)

joran
joran

Reputation: 173517

To illustrate the problem is modifying the size of the object in the for loop that you are looping over, see this example:

n <- nrow(mtcars)

for (i in 1:n){
  cat("\n mtcars currently has",nrow(mtcars),"rows;","accessing row",i)
  if (mtcars$cyl[i] == 4){
    mtcars <- mtcars[-i,]
  }
}

> mtcars currently has 32 rows; accessing row 1
 mtcars currently has 32 rows; accessing row 2
 mtcars currently has 32 rows; accessing row 3
 mtcars currently has 31 rows; accessing row 4
 mtcars currently has 31 rows; accessing row 5
 mtcars currently has 31 rows; accessing row 6
 mtcars currently has 31 rows; accessing row 7
 mtcars currently has 30 rows; accessing row 8
 mtcars currently has 30 rows; accessing row 9
 mtcars currently has 30 rows; accessing row 10
 mtcars currently has 30 rows; accessing row 11
 mtcars currently has 30 rows; accessing row 12
 mtcars currently has 30 rows; accessing row 13
 mtcars currently has 30 rows; accessing row 14
 mtcars currently has 30 rows; accessing row 15
 mtcars currently has 30 rows; accessing row 16
 mtcars currently has 29 rows; accessing row 17
 mtcars currently has 28 rows; accessing row 18
 mtcars currently has 28 rows; accessing row 19
 mtcars currently has 28 rows; accessing row 20
 mtcars currently has 28 rows; accessing row 21
 mtcars currently has 28 rows; accessing row 22
 mtcars currently has 27 rows; accessing row 23
 mtcars currently has 26 rows; accessing row 24
 mtcars currently has 26 rows; accessing row 25
 mtcars currently has 26 rows; accessing row 26
 mtcars currently has 25 rows; accessing row 27
Error in if (mtcars$cyl[i] == 4) { : 
  missing value where TRUE/FALSE needed

Upvotes: 1

Nick
Nick

Reputation: 286

You might be able to simplify this to

newdata <-  data[!grepl("MR", data$EXAM_CODE),]

Upvotes: 0

Related Questions