Reputation: 13
I am trying to replace all NAs for those columns with 0 or 1 only. However, I found that apply failed to deal with the NAs. If I replace the NAs with an arbitrary string i.e. "Unknown". Then lapply and apply yield the same result. Any explanation would be greatly appreciated.
Here is an example.
df<-data.frame(a=c(0,1,NA),b=c(0,1,0),c=c('d',NA,'c'))
apply(df,2,function(x){all(x %in% c(0,1,NA)) })
unlist(lapply(df,function(x){all(x %in% c(0,1,NA))}))
Upvotes: 1
Views: 94
Reputation: 886938
It is not recommended to use apply
on a data.frame
with different classes. The recommended option is lapply
. Issue is that with apply
, it converts to matrix
and this can result in some issues especially when there are missing values involved i.e. creating extra spaces.
apply(df, 2, I)
# a b c
#[1,] " 0" "0" "d"
#[2,] " 1" "1" NA
#[3,] NA "0" "c"
If instead if the first column was already character, then the NA
conversion from NA_real_
to NA_character_
wouldn't occur i.e.
df1 <- df
df1$a <- as.character(c(0, 1, NA))
apply(df1, 2, I)
# a b c
#[1,] "0" "0" "d"
#[2,] "1" "1" NA
#[3,] NA "0" "c"
An option is to wrap with trimws
to remove the leading
spaces
apply(df,2,function(x){all(trimws(x) %in% c(0,1,NA)) })
# a b c
# TRUE TRUE FALSE
NOTE: For testing the presence of NA
, it is recommended to use is.na
instead of %in%
Upvotes: 1