Reputation: 2455
I was using sum(is.na(my.df))
to check whether my data frame contained any NAs, which worked as I expected, but sum(is.nan(my.df))
did not work as I expected.
> my.df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN))
> my.df
a b
1 1 5
2 2 NA
3 3 NaN
> is.na(my.df)
a b
[1,] FALSE FALSE
[2,] FALSE TRUE
[3,] FALSE TRUE
> is.nan(my.df)
a b
FALSE FALSE
> sum(is.na(my.df))
[1] 2
> sum(is.nan(my.df))
[1] 0
Oh dear.
Is there a reason for the inconsistency in behaviour? Is it for a lack of implementation, or is it intentional? What does the return value of is.nan(my.df)
signify? Is there a good reason not to use is.nan()
on a whole data frame?
In the documentation for is.na( )
and is.nan( )
, the argument types seem the same (although they don't specifically list data frames):
is.na()
: x R object to be tested: the default methods handle atomic vectors, lists and pairlists.
is.nan()
: x R object to be tested: the default methods handle atomic vectors, lists and pairlists.
Upvotes: 27
Views: 60391
Reputation: 6363
The is.nan
function does not work with lists for some odd reason. Why it differs from is.na
is beyond me and appears to be a language design issue. However, there is a simple solution:
df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN))
df <- data.frame(sapply(df, function(x) ifelse(is.nan(x), NA, x)))
df
a b
1 1 5
2 2 NA
3 3 NA
Upvotes: 8
Reputation: 226182
From ?is.nan
:
All elements of logical,integer and raw vectors are considered not to be NaN, and
elements of lists and pairlists are also unless the element is a length-one numeric
or complex vector whose single element is NaN.
The columns of a data frame are technically "elements of a list", so is.nan(df)
returns a vector with length equal to the number of columns of the data frame, which is TRUE
only if the column consists of a single NaN
element:
> is.nan(data.frame(a=NaN,b=NA,c=1))
a b c
TRUE FALSE FALSE
If you want behavior matching that of is.na
, use apply
:
sum(apply(my.df,2,is.nan))
The answer is 1 rather than 2 because is.nan(NA)
is FALSE
...
edit: alternatively, you can just turn the data frame into a matrix:
sum(is.nan(as.matrix(my.df)))
update: this behaviour changed shortly (two months) after the question was asked, in R version 2.14 (October 2011): from the NEWS file,
o The default methods for is.finite(), is.infinite() and is.nan() now signal an error if their argument is not an atomic vector.
Upvotes: 28