Reputation: 109
I am trying to change the data type of my variables in data frame to 'factor' if they are 'character'. I have tried to replicate the problem using sample data as below
a <- c("AB","BC","AB","BC","AB","BC")
b <- c(12,23,34,45,54,65)
df <- data.frame(a,b)
str(df)
'data.frame': 6 obs. of 2 variables:
$ a: chr "AB" "BC" "AB" "BC" ...
$ b: num 12 23 34 45 54 65
I wrote the below function to achieve that
abc <- function(x) {
for(i in names(x)){
if(is.character(x[[i]])) {
x[[i]] <- as.factor(x[[i]])
}
}
}
The function is executing properly if i pass the dataframe (df), but still it doesn't change the 'character' to 'factor'.
abc(df)
str(df)
'data.frame': 6 obs. of 2 variables:
$ a: chr "AB" "BC" "AB" "BC" ...
$ b: num 12 23 34 45 54 65
NOTE: It works perfectly with for loop and if condition. When I tried to generalize it by writing a function around it, there's a problem.
Please help. What am I missing ?
Upvotes: 0
Views: 429
Reputation: 20399
Besides the comment from @Roland, you should make use of R's nice indexing possibilities and learn about the *apply
family. With that you can rewrite your code to
change_to_factor <- function(df_in) {
chr_ind <- vapply(df_in, is.character, logical(1))
df_in[, chr_ind] <- lapply(df_in[, chr_ind, drop = FALSE], as.factor)
df_in
}
Explanation
vapply
loops over all elements of a list, applies a function to each element and returns a value of the given type (here a boolean logical(1)
). Since in R
data frames are in fact lists
where each (list) element is required to be of the same length, you can conveniently loop over all the columns of the data frame and apply the function is.character
to each column. vapply
then returns a boolean (logical) vector with TRUE/FALSE
values depending on whether the column was a character column or not.lapply
is yet another memeber of the *apply
family and loops through list elements and returns a list. We loop now over the character columns, apply as.factor
to them and return a list of them which we conveniently store in the original positions in the data frameBy the way, if you look at str(df)
you will see that column b
is already a factor. This is because data.frame
automatically converts character columns to characters. To avoid that you need to pass stringsAsFactors = FALSE
to data.frame
:
a <- c("AB", "BC", "AB", "BC", "AB", "BC")
b <- c(12, 23, 34, 45, 54, 65)
df <- data.frame(a, b)
str(df) # column b is factor
# 'data.frame': 6 obs. of 2 variables:
# $ a: Factor w/ 2 levels "AB","BC": 1 2 1 2 1 2
# $ b: num 12 23 34 45 54 65
str(df2 <- data.frame(a, b, stringsAsFactors = FALSE))
# 'data.frame': 6 obs. of 2 variables:
# $ a: chr "AB" "BC" "AB" "BC" ...
# $ b: num 12 23 34 45 54 65
str(change_to_factor(df2))
# 'data.frame': 6 obs. of 2 variables:
# $ a: Factor w/ 2 levels "AB","BC": 1 2 1 2 1 2
# $ b: num 12 23 34 45 54 65
It may also be worth to learn the tidyverse
syntax with which you can simply do
library(tidyverse)
df2 %>%
mutate_if(is.character, as.factor) %>%
str()
Upvotes: 2