mhovd
mhovd

Reputation: 4087

Why does R convert numbers and characters to factors when coercing to data frame?

Recently I have come across a problem where my data has been converted to factors. This is a large nuisance, as it's not (always) easily picked up on.

I am aware that I can convert them back with solutions such as as.character(paste(x)) or as.character(paste(x)), but that seems really unnecessary.

Example code:

nums <- c(1,2,3,4,5)
chars <- c("A","B","C,","D","E")
str(nums)
#>  num [1:5] 1 2 3 4 5
str(chars)
#>  chr [1:5] "A" "B" "C," "D" "E"
df <- as.data.frame(cbind(a = nums, b = chars))
str(df)
#> 'data.frame':    5 obs. of  2 variables:
#>  $ a: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
#>  $ b: Factor w/ 5 levels "A","B","C,","D",..: 1 2 3 4 5

Upvotes: 3

Views: 1201

Answers (2)

Guilherme D. Garcia
Guilherme D. Garcia

Reputation: 179

This should no longer happen if you have updated R: data frames don't automatically turn chr to fct. In a way, data frames are now more similar to tibbles.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389325

  1. Don't cbind as it converts data to matrix and matrix can hold data of only one type, so it converts numbers to characters.

  2. Use data.frame because as.data.frame(a = nums, b = chars) returns an error.

  3. Use stringsAsFactors = FALSE because in data.frame default value of stringsAsFactors is TRUE which converts characters to factors. The numbers also change to factors because in 1) they have been changed to characters.

    df <- data.frame(a = nums, b = chars, stringsAsFactors = FALSE)
     str(df)
     #'data.frame':  5 obs. of  2 variables:
     # $ a: num  1 2 3 4 5
     # $ b: chr  "A" "B" "C," "D" ...
    

EDIT: As of the newest version of R, the default value of stringAsFactors has changed to FALSE.

Upvotes: 4

Related Questions