Zach
Zach

Reputation: 2455

Data frames and is.nan()

I was using sum(is.na(my.df)) to check whether my data frame contained any NAs, which worked as I expected, but sum(is.nan(my.df)) did not work as I expected.

> my.df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN))
> my.df
  a   b
1 1   5
2 2  NA
3 3 NaN
> is.na(my.df)
         a     b
[1,] FALSE FALSE
[2,] FALSE  TRUE
[3,] FALSE  TRUE
> is.nan(my.df)
    a     b 
FALSE FALSE 
> sum(is.na(my.df))
[1] 2
> sum(is.nan(my.df))
[1] 0

Oh dear. Is there a reason for the inconsistency in behaviour? Is it for a lack of implementation, or is it intentional? What does the return value of is.nan(my.df) signify? Is there a good reason not to use is.nan() on a whole data frame?

In the documentation for is.na( ) and is.nan( ), the argument types seem the same (although they don't specifically list data frames):

is.na(): x R object to be tested: the default methods handle atomic vectors, lists and pairlists. is.nan(): x R object to be tested: the default methods handle atomic vectors, lists and pairlists.

Upvotes: 27

Views: 60391

Answers (2)

Adam Erickson
Adam Erickson

Reputation: 6363

The is.nan function does not work with lists for some odd reason. Why it differs from is.na is beyond me and appears to be a language design issue. However, there is a simple solution:

df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN)) 
df <- data.frame(sapply(df, function(x) ifelse(is.nan(x), NA, x)))
df
  a  b
1 1  5
2 2 NA
3 3 NA

Upvotes: 8

Ben Bolker
Ben Bolker

Reputation: 226182

From ?is.nan:

All elements of logical,integer and raw vectors are considered not to be NaN, and
elements of lists and pairlists are also unless the element is a length-one numeric
or complex vector whose single element is NaN.

The columns of a data frame are technically "elements of a list", so is.nan(df) returns a vector with length equal to the number of columns of the data frame, which is TRUE only if the column consists of a single NaN element:

> is.nan(data.frame(a=NaN,b=NA,c=1))
    a     b     c 
 TRUE FALSE FALSE 

If you want behavior matching that of is.na, use apply:

sum(apply(my.df,2,is.nan))

The answer is 1 rather than 2 because is.nan(NA) is FALSE ...

edit: alternatively, you can just turn the data frame into a matrix:

 sum(is.nan(as.matrix(my.df)))

update: this behaviour changed shortly (two months) after the question was asked, in R version 2.14 (October 2011): from the NEWS file,

o The default methods for is.finite(), is.infinite() and is.nan() now signal an error if their argument is not an atomic vector.

Upvotes: 28

Related Questions