Reputation: 23
An if-statement returns an "missing Value"-error when there is a perfectly healthy value.
I wanted to write a simple script to delete rows in a dataset if one of their entries contains a certain tag. I assign an indicator variable in a new column (containsMR) and then iterate over the rows using a for-loop. If the indicator is TRUE, the row should be removed.
The indicators get assigned correctly, so far, so good. The interesting part: In the loop's if-statement seems to have trouble reading the values, because it returns "Error in if (data$containsMR[i]) { : missing value where TRUE/FALSE needed".
Given the correct (and complete) assignment of indicator variables, this surprises me. What is even more weird: Some, but not all the rows with a positive indicator are removed (checked with printouts and table(data$containsMR) ).
And now the really weird stuff: if I run the same loop one more time, it removes the rest of the columns (as it should), but returns the same error. So, theoretically, I could just run the loop twice, ignore the errors and walk away with the result I wanted. That's just really not the point of what I'm doing.
Bugfixes tried: - changed for- to while-loop - changed indicators (and if-statement) to integer (0,1) - ran the script in RStudio and R console - changed variable names, included/excluded definitions (e.g. adding the proxy variable row_number instead of calling it in one line.
# Script to delete all rows containing "MR" in column "EXAM_CODE"
# import file
data <- read.csv("C:\\ScriptingTest\\ablations 0114.csv")
# add indicator column
for (i in 1:nrow(data)){
data$containsMR[i] <- ifelse(grepl("MR", toString(data$EXAM_CODE[i])), TRUE, FALSE)
}
# remove rows with positive indicator
row_number <- nrow(data)
for (i in 1:row_number){
if (data$containsMR[i]){
data <- data[-c(i),]
}
}
# export csv
write.csv(data, "C:\\ScriptingTest\\export.csv")
Upvotes: 0
Views: 142
Reputation: 173517
To illustrate the problem is modifying the size of the object in the for loop that you are looping over, see this example:
n <- nrow(mtcars)
for (i in 1:n){
cat("\n mtcars currently has",nrow(mtcars),"rows;","accessing row",i)
if (mtcars$cyl[i] == 4){
mtcars <- mtcars[-i,]
}
}
> mtcars currently has 32 rows; accessing row 1
mtcars currently has 32 rows; accessing row 2
mtcars currently has 32 rows; accessing row 3
mtcars currently has 31 rows; accessing row 4
mtcars currently has 31 rows; accessing row 5
mtcars currently has 31 rows; accessing row 6
mtcars currently has 31 rows; accessing row 7
mtcars currently has 30 rows; accessing row 8
mtcars currently has 30 rows; accessing row 9
mtcars currently has 30 rows; accessing row 10
mtcars currently has 30 rows; accessing row 11
mtcars currently has 30 rows; accessing row 12
mtcars currently has 30 rows; accessing row 13
mtcars currently has 30 rows; accessing row 14
mtcars currently has 30 rows; accessing row 15
mtcars currently has 30 rows; accessing row 16
mtcars currently has 29 rows; accessing row 17
mtcars currently has 28 rows; accessing row 18
mtcars currently has 28 rows; accessing row 19
mtcars currently has 28 rows; accessing row 20
mtcars currently has 28 rows; accessing row 21
mtcars currently has 28 rows; accessing row 22
mtcars currently has 27 rows; accessing row 23
mtcars currently has 26 rows; accessing row 24
mtcars currently has 26 rows; accessing row 25
mtcars currently has 26 rows; accessing row 26
mtcars currently has 25 rows; accessing row 27
Error in if (mtcars$cyl[i] == 4) { :
missing value where TRUE/FALSE needed
Upvotes: 1
Reputation: 286
You might be able to simplify this to
newdata <- data[!grepl("MR", data$EXAM_CODE),]
Upvotes: 0