Reputation: 11115
I have a data frame. Let's call him bob
:
> head(bob)
phenotype exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
I'd like to concatenate the rows of this data frame (this will be another question). But look:
> class(bob$phenotype)
[1] "factor"
Bob
's columns are factors. So, for example:
> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)" "c(3, 3, 3, 3, 3, 3)"
[3] "c(29, 29, 29, 30, 30, 30)"
I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob
? Not what I need.
Strangely I can go through the columns of bob
by hand, and do
bob$phenotype <- as.character(bob$phenotype)
which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?
Bonus question: why does the manual approach work?
Upvotes: 405
Views: 770297
Reputation: 50744
To replace only factors:
i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)
In package dplyr in version 0.5.0 new function mutate_if
was introduced:
library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob
...and in version 1.0.0 was replaced by across
:
library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob
Package purrr from RStudio gives another alternative:
library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob
Upvotes: 349
Reputation: 940
New function "across" was introduced in dplyr version 1.0.0. The new function will supersede scoped variables (_if, _at, _all). Here's the official documentation
library(dplyr)
bob <- bob %>%
mutate(across(where(is.factor), as.character))
Upvotes: 3
Reputation: 592
With the dplyr
-package loaded use
bob=bob%>%mutate_at("phenotype", as.character)
if you only want to change the phenotype
-column specifically.
Upvotes: 2
Reputation: 3056
This works transforming all to character and then the numeric to numeric:
makenumcols<-function(df){
df<-as.data.frame(df)
df[] <- lapply(df, as.character)
cond <- apply(df, 2, function(x) {
x <- x[!is.na(x)]
all(suppressWarnings(!is.na(as.numeric(x))))
})
numeric_cols <- names(df)[cond]
df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
return(df)
}
Adapted from: Get column types of excel sheet automatically
Upvotes: 0
Reputation: 57
Maybe a newer option?
library("tidyverse")
bob <- bob %>% group_by_if(is.factor, as.character)
Upvotes: 0
Reputation: 1960
You should use convert
in hablar
which gives readable syntax compatible with tidyverse
pipes:
library(dplyr)
library(hablar)
df <- tibble(a = factor(c(1, 2, 3, 4)),
b = factor(c(5, 6, 7, 8)))
df %>% convert(chr(a:b))
which gives you:
a b
<chr> <chr>
1 1 5
2 2 6
3 3 7
4 4 8
Upvotes: 2
Reputation: 2393
If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.
Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:
> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d
> as.numeric(fact)
[1] 1 2 1 3
The numbers returned in the last line correspond to the levels of the factor.
> levels(fact)
[1] "a" "b" "d"
Notice that levels()
returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:
> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"
This also works for numeric values, provided you wrap your expression in as.numeric()
.
> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
Upvotes: 26
Reputation: 368479
The global option
stringsAsFactors: The default setting for arguments of data.frame and read.table.
may be something you want to set to FALSE
in your startup files (e.g. ~/.Rprofile). Please see help(options)
.
Upvotes: 43
Reputation:
At the beginning of your data frame include stringsAsFactors = FALSE
to ignore all misunderstandings.
Upvotes: 7
Reputation: 16697
If you would use data.table
package for the operations on data.frame then the problem is not present.
library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
# col1 col2
#"character" "integer"
If you have a factor columns in you dataset already and you want to convert them to character you can do the following.
library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
# col1 col2
# "factor" "integer"
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
# col1 col2
#"character" "integer"
Upvotes: 6
Reputation: 100194
Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.
As @hadley points out, the following is more concise.
bob[] <- lapply(bob, as.character)
In both cases, lapply
outputs a list; however, owing to the magical properties of R, the use of []
in the second case keeps the data.frame class of the bob
object, thereby eliminating the need to convert back to a data.frame using as.data.frame
with the argument stringsAsFactors = FALSE
.
Upvotes: 396
Reputation: 2480
This works for me - I finally figured a one liner
df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
Upvotes: 3
Reputation: 8744
I typically make this function apart of all my projects. Quick and easy.
unfactorize <- function(df){
for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
return(df)
}
Upvotes: 16
Reputation: 729
If you want a new data frame bobc
where every factor vector in bobf
is converted to a character vector, try this:
bobc <- rapply(bobf, as.character, classes="factor", how="replace")
If you then want to convert it back, you can create a logical vector of which columns are factors, and use that to selectively apply factor
f <- sapply(bobf, class) == "factor"
bobc[,f] <- lapply(bobc[,f], factor)
Upvotes: 22
Reputation: 30485
Another way is to convert it using apply
bob2 <- apply(bob,2,as.character)
And a better one (the previous is of class 'matrix')
bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)
Upvotes: 11
Reputation: 36120
Or you can try transform
:
newbob <- transform(bob, phenotype = as.character(phenotype))
Just be sure to put every factor you'd like to convert to character.
Or you can do something like this and kill all the pests with one blow:
newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
newbob_rest <- bob[!(sapply(bob, is.factor))]
newbob <- cbind(newbob_char, newbob_rest)
It's not good idea to shove the data in code like this, I could do the sapply
part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)
This approach, however, has a downside... you must reorganize columns afterwards, while with transform
you can do whatever you like, but at cost of "pedestrian-style-code-writting"...
So there... =)
Upvotes: 8
Reputation: 27359
Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.
Try this:
bob2 <- data.frame(bob, stringsAsFactors = FALSE)
Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors
setting somewhere to help you (including a global setting).
Upvotes: 9