Reputation: 338
Using the adult data set as an example, the character values can be converted to factors like this:
adult <- read.csv('https://raw.githubusercontent.com/InfiniteCuriosity/templatedemo/main/Adult.csv')
library(tidyverse)
df <- mutate_if(adult, is.character, as.factor)
That will convert the character columns to a factors. I don't see a way to convert those factors to factor levels. The warning in ?factor states, "To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f))", but I don't see a way to do that for all of the columns with factors in a data set, such as the adult data set. I can convert them to numbers, but not factor levels.
The conversion to numbers is:
df <- mutate_if(df, is.factor, as.numeric)
If I try as.numeric(levels(df))[df], as suggested by ?as.factor, that returns the error:
Error in as.numeric(levels(df))[df] : invalid subscript type 'closure'
Upvotes: 2
Views: 1712
Reputation: 72593
Just use type.convert
setting as.is=FALSE
.
> adult <- type.convert(adult, as.is=FALSE)
> str(adult)
'data.frame': 100 obs. of 16 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ age : int 50 38 53 28 37 49 52 31 42 37 ...
$ workclass : Factor w/ 7 levels " ?"," Federal-gov",..: 6 4 4 4 4 4 6 4 4 4 ...
$ fnlwgt : int 83311 215646 234721 338409 284582 160187 209642 45781 159449 280464 ...
$ education : Factor w/ 13 levels " 10th"," 11th",..: 8 10 2 8 11 5 10 11 8 13 ...
$ education_num : int 13 9 7 13 14 5 9 14 13 10 ...
$ marital_status: Factor w/ 6 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 5 3 3 ...
$ occupation : Factor w/ 13 levels " ?"," Adm-clerical",..: 4 6 6 9 4 8 4 9 4 4 ...
$ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 2 1 1 ...
$ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 3 ...
$ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
$ capital_gain : int 0 0 0 0 0 0 0 14084 5178 0 ...
$ capital_loss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hours_per_week: int 13 40 40 40 40 16 45 50 40 80 ...
$ native_country: Factor w/ 10 levels " ?"," Cuba"," England",..: 10 10 10 2 10 6 10 10 10 10 ...
$ income : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
Upvotes: 2