a different ben
a different ben

Reputation: 4008

Why are these NAs returned?

Please excuse the poor question title... I could not think how to ask it in a better way.

I think the code speaks for itself, but just to labour the point, I 'searched' for a coded value in a data frame, replaced them all with NA, but on checking if they were gone I got a surprising result (to me).

> df[df==-999.25]
 [1] "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000"
 [6] "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000"
[11] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
[16] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
[21] "-9.992500e+02" "-9.992500e+02" "-999.25000000" "-999.25000000" "-999.25000000"
[26] "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000"
[31] "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000" "-999.25000000"
[36] "-999.25000000" "-999.25000000" "-999.25000000" "-9.992500e+02" "-9.992500e+02"
[41] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
[46] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
[51] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
[56] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
[61] "-9.992500e+02" "-9.992500e+02" "-9.992500e+02" "-9.992500e+02"
> df[df==-999.25] <- NA
> df[df==-999.25]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[30] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[59] NA NA NA NA NA NA

I am confused by this. What is the reason for it? (I am also tired, perhaps I should have sat on this for a day or two). I checked the help for '<-' and '[', but learnt nothing (I could not follow all of it).

Upvotes: 0

Views: 67

Answers (2)

Ehsan Masoudi
Ehsan Masoudi

Reputation: 712

When you assign NA to an object it means that you do not know what is inside that object, missing values, so logical statement and arithmetic calculation as * + < == > can not be applied on the missing value and R reacts by returning NA for this cases:

a <- NA

a * 0
[1] NA

a/0
[1] NA

a<0
[1] NA

a == 0
[1] NA

Finally, I guess that you expected the result to be False instead of NA in df[df==-999.25], but how R can make inference on your logical statement when R have no idea about the Not Available or Missing Data

Upvotes: 1

JeremyS
JeremyS

Reputation: 3525

NA's are always returned when you use == if they are present because the result of comparing NA to whatever is NA which is returned by default. if you want them gone then you need to add & !is.na(df)

for example

test <- c(NA,5,3,2,34,"Bob")
test[test == "Bob"]
[1] NA    "Bob"

because

test == "Bob"
[1] NA FALSE FALSE FALSE FALSE  TRUE

Upvotes: 3

Related Questions