Reputation: 133
I was trying to convert tolower the features of my dataframe that are of type character and found out this post:
tolower
I build up a function to do so on several data.frames and finally discovered that all my features were treated as characters!
mytolower <- function(p_vector){
if (is.character(p_vector)) return(tolower(iconv(enc2utf8(p_vector), sub = "byte")))
else return(p_vector)
}
for (df in c("train", "test")) as.data.frame(apply(get(df), 2, function(x) mytolower(x)), stringsAsFactors = FALSE)
Looking better on Stackoverflow, I found out this 2nd post that partially solved the issue by using lapply, but which curiously suggest that apply and sapply work in a similar way
lapply rather than apply
Thus, I finally build up this example that basically illustrate my trouble:
train <- data.frame(v1=1:3, v2=c("a","b","c"), v3=11:13, stringsAsFactors = FALSE)
str(train)
apply(train, 2, function(x) is.character(x)) #wrong
lapply(train, function(x) is.character(x)) #right
sapply(train, function(x) is.character(x)) #right
sapply(train, is.character) #right
While apply will consider all features as being "character", lapply or sapply will be able to distinguish numerical and character features. Why is it so ? Is there a way to make apply find the right answer ? Thanks
Upvotes: 2
Views: 1177
Reputation: 803
Before is.character()
is applied, train
is first coerced to a matrix. Since a matrix holds only objects of a single type, all elements become character strings.
From the help file for apply()
:
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
I would suggest using the mutate_if()
function from dplyr
.
library(dplyr)
mutate_if(train, is.character, toupper)
# v1 v2 v3
# 1 1 A 11
# 2 2 B 12
# 3 3 C 13
Upvotes: 3
Reputation: 929
apply function needs a matrix or array as it's input and it force converts the data frame you are feeding it and that as.matrix() conversion converts all of the array to a character array since all columns are coming out to be of character type.
Upvotes: 0