qwewqe213
qwewqe213

Reputation: 13

apply function yields the wrong answer

I am trying to replace all NAs for those columns with 0 or 1 only. However, I found that apply failed to deal with the NAs. If I replace the NAs with an arbitrary string i.e. "Unknown". Then lapply and apply yield the same result. Any explanation would be greatly appreciated.

Here is an example.

df<-data.frame(a=c(0,1,NA),b=c(0,1,0),c=c('d',NA,'c'))
apply(df,2,function(x){all(x %in% c(0,1,NA)) })
unlist(lapply(df,function(x){all(x %in% c(0,1,NA))}))

Upvotes: 1

Views: 94

Answers (1)

akrun
akrun

Reputation: 886938

It is not recommended to use apply on a data.frame with different classes. The recommended option is lapply. Issue is that with apply, it converts to matrix and this can result in some issues especially when there are missing values involved i.e. creating extra spaces.

apply(df, 2, I)
#     a    b   c  
#[1,] " 0" "0" "d"
#[2,] " 1" "1" NA 
#[3,] NA   "0" "c"

If instead if the first column was already character, then the NA conversion from NA_real_ to NA_character_ wouldn't occur i.e.

df1 <- df
df1$a <- as.character(c(0, 1, NA))
apply(df1, 2, I)
 #    a   b   c  
#[1,] "0" "0" "d"
#[2,] "1" "1" NA 
#[3,] NA  "0" "c"

An option is to wrap with trimws to remove the leading spaces

apply(df,2,function(x){all(trimws(x) %in% c(0,1,NA)) })
#    a     b     c 
# TRUE  TRUE FALSE 

NOTE: For testing the presence of NA, it is recommended to use is.na instead of %in%

Upvotes: 1

Related Questions