Reputation: 623
I have a code in R that reads, one line at the time, through a data.frame and if a certain set of conditions is met, changes the value of one of the variables in the data.frame. In pseudo code:
for(i in 1:nrow(data)) {
if (conditions on data[i,]) { change value } else {do nothing}
}
While the code is running, at a certain point it stops and throws the following error message: Error in if (condition : missing value where TRUE/FALSE needed
I understand that the error message means that, at a certain point, when the condition in the if
statement is evaluated the result is Na
rather than a TRUE
or FALSE
.
However, when I try the condition in R by using the value of i
that is "stored" in R (and which I assume to be the row of the data set that throws the error) I get an answer of TRUE
. Do I understand correctly that the value of i
allows me to identify which line of the data frame is throwing the error? If not, should I look for some other way to identify which row of the data set is causing the error?
Upvotes: 1
Views: 186
Reputation: 54
1) replacing values
wouldn't it be better to use replace
?
some examples here: replace function examples
in your case
replace (df$column, your_condition, value)
2) filtering
if you're sure your data contains only TRUEs/FALSEs or NAs you can:
a) subset rows with NAs in specific column
df[(is.na(df$column)), ]
b) filter out things using filter
from dplyr
library("dplyr")
filter(df, is.na(column)) # filter NAs in dplyr you don't have to use $ to specify column
filter(df, !is.na(column) & column!="FALSE") # filter everything other than NA and FALSE
filter(df, column!="TRUE" & column!="FALSE") # careful with that, won't return NAs
3) selecting row numbers
finally, when you need specific row number where NAs occur, use which
which(is.na(df$column)) # row numbers with NAs
which(df$column!="TRUE") # row numbers other than TRUEs
which(df$column!="TRUE" & df$column!="FALSE") # again, won't return NAs
Upvotes: 0
Reputation: 24955
As long as your for loop is not inside a function, i will be equal to the final value it hit before the error. Thus after your error:
data[i, ]
Should give you the pathological row.
If you are running inside a function, due to scoping rules, i should die with the function. In that case, I would modify your code to print out every line (or i) until it dies:
for(i in 1:nrow(data)) {
print(i) #or print(data[i, ])
if (conditions on data[i,]) { change value } else {do nothing}
}
Upvotes: 0
Reputation: 226911
I think the answer is "yes"
print(i) ## Error: doesn't exist yet
for (i in 1:10) {
if (i==4) stop("simulated error")
}
print(i) ## 4
The try()
function can also be useful. Here we make a function f
that simulates the error, then use try()
so that we can run all the way through the loop. We don't stopping when we hit the error, but instead fill in a value (10000 in this case) that stands for an error code. (We could also just make the error behaviour be a no-op, i.e. just go on to the next iteration of the loop; in this case that would leave an NA
in the error position.)
f <- function(x) {
if (x==4) stop("simulated error")
return(x)
}
results <- rep(NA,10)
for (i in 1:10) {
res <- try(f(i))
if (is(res,"try-error")) {
results[i] <- 10000
} else {
results[i] <- res
}
}
Upvotes: 1