chapelon
chapelon

Reputation: 133

sapply and apply give different results with is.character()

I was trying to convert tolower the features of my dataframe that are of type character and found out this post:
tolower
I build up a function to do so on several data.frames and finally discovered that all my features were treated as characters!

mytolower <- function(p_vector){
  if (is.character(p_vector)) return(tolower(iconv(enc2utf8(p_vector), sub = "byte")))
  else return(p_vector)
}
for (df in c("train", "test")) as.data.frame(apply(get(df), 2, function(x) mytolower(x)), stringsAsFactors = FALSE)

Looking better on Stackoverflow, I found out this 2nd post that partially solved the issue by using lapply, but which curiously suggest that apply and sapply work in a similar way
lapply rather than apply
Thus, I finally build up this example that basically illustrate my trouble:

train <- data.frame(v1=1:3, v2=c("a","b","c"), v3=11:13, stringsAsFactors = FALSE)
str(train)
apply(train, 2, function(x) is.character(x)) #wrong
lapply(train, function(x) is.character(x)) #right
sapply(train, function(x) is.character(x)) #right
sapply(train, is.character) #right

While apply will consider all features as being "character", lapply or sapply will be able to distinguish numerical and character features. Why is it so ? Is there a way to make apply find the right answer ? Thanks

Upvotes: 2

Views: 1177

Answers (2)

coletl
coletl

Reputation: 803

Before is.character() is applied, train is first coerced to a matrix. Since a matrix holds only objects of a single type, all elements become character strings.

From the help file for apply():

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.

I would suggest using the mutate_if() function from dplyr.

library(dplyr)
mutate_if(train, is.character, toupper)

#    v1 v2 v3
#    1  1  A 11
#    2  2  B 12
#    3  3  C 13

Upvotes: 3

anonR
anonR

Reputation: 929

apply function needs a matrix or array as it's input and it force converts the data frame you are feeding it and that as.matrix() conversion converts all of the array to a character array since all columns are coming out to be of character type.

Upvotes: 0

Related Questions