Aravindh Rajan
Aravindh Rajan

Reputation: 109

Issue with user defined function in R

I am trying to change the data type of my variables in data frame to 'factor' if they are 'character'. I have tried to replicate the problem using sample data as below

a <- c("AB","BC","AB","BC","AB","BC")
b <- c(12,23,34,45,54,65)
df <- data.frame(a,b)
str(df)

'data.frame':   6 obs. of  2 variables:
 $ a: chr  "AB" "BC" "AB" "BC" ...
 $ b: num  12 23 34 45 54 65

I wrote the below function to achieve that

abc <- function(x) {
  for(i in names(x)){
    if(is.character(x[[i]])) {
      x[[i]] <- as.factor(x[[i]])
    }
  }
}

The function is executing properly if i pass the dataframe (df), but still it doesn't change the 'character' to 'factor'.

abc(df)

str(df)
'data.frame':   6 obs. of  2 variables:
 $ a: chr  "AB" "BC" "AB" "BC" ...
 $ b: num  12 23 34 45 54 65

NOTE: It works perfectly with for loop and if condition. When I tried to generalize it by writing a function around it, there's a problem.

Please help. What am I missing ?

Upvotes: 0

Views: 429

Answers (1)

thothal
thothal

Reputation: 20399

Besides the comment from @Roland, you should make use of R's nice indexing possibilities and learn about the *apply family. With that you can rewrite your code to

change_to_factor <- function(df_in) {
    chr_ind <- vapply(df_in, is.character, logical(1))
    df_in[, chr_ind] <- lapply(df_in[, chr_ind, drop = FALSE], as.factor)
    df_in
}

Explanation

  • vapply loops over all elements of a list, applies a function to each element and returns a value of the given type (here a boolean logical(1)). Since in R data frames are in fact lists where each (list) element is required to be of the same length, you can conveniently loop over all the columns of the data frame and apply the function is.character to each column. vapply then returns a boolean (logical) vector with TRUE/FALSE values depending on whether the column was a character column or not.
  • You can then use this boolean vector to subset your data frame to look only at columns which are character columns.
  • lapply is yet another memeber of the *apply family and loops through list elements and returns a list. We loop now over the character columns, apply as.factor to them and return a list of them which we conveniently store in the original positions in the data frame

By the way, if you look at str(df) you will see that column b is already a factor. This is because data.frame automatically converts character columns to characters. To avoid that you need to pass stringsAsFactors = FALSE to data.frame:

a <- c("AB", "BC", "AB", "BC", "AB", "BC")
b <- c(12, 23, 34, 45, 54, 65)
df <- data.frame(a, b)

str(df) # column b is factor
# 'data.frame':   6 obs. of  2 variables:
# $ a: Factor w/ 2 levels "AB","BC": 1 2 1 2 1 2
# $ b: num  12 23 34 45 54 65

str(df2 <- data.frame(a, b, stringsAsFactors = FALSE))
# 'data.frame':   6 obs. of  2 variables:
#  $ a: chr  "AB" "BC" "AB" "BC" ...
#  $ b: num  12 23 34 45 54 65

str(change_to_factor(df2))
# 'data.frame':   6 obs. of  2 variables:
#  $ a: Factor w/ 2 levels "AB","BC": 1 2 1 2 1 2
#  $ b: num  12 23 34 45 54 65

It may also be worth to learn the tidyverse syntax with which you can simply do

library(tidyverse)
df2 %>% 
  mutate_if(is.character, as.factor) %>% 
  str()

Upvotes: 2

Related Questions