Russ Conte
Russ Conte

Reputation: 338

How to convert character to factor levels for an entire data set

Using the adult data set as an example, the character values can be converted to factors like this:

adult <- read.csv('https://raw.githubusercontent.com/InfiniteCuriosity/templatedemo/main/Adult.csv')

library(tidyverse)
df <- mutate_if(adult, is.character, as.factor)

That will convert the character columns to a factors. I don't see a way to convert those factors to factor levels. The warning in ?factor states, "To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f))", but I don't see a way to do that for all of the columns with factors in a data set, such as the adult data set. I can convert them to numbers, but not factor levels.

The conversion to numbers is:

df <- mutate_if(df, is.factor, as.numeric)

If I try as.numeric(levels(df))[df], as suggested by ?as.factor, that returns the error:

Error in as.numeric(levels(df))[df] : invalid subscript type 'closure'

Upvotes: 2

Views: 1712

Answers (1)

jay.sf
jay.sf

Reputation: 72593

Just use type.convert setting as.is=FALSE.

> adult <- type.convert(adult, as.is=FALSE)
> str(adult)
'data.frame':   100 obs. of  16 variables:
 $ X             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ age           : int  50 38 53 28 37 49 52 31 42 37 ...
 $ workclass     : Factor w/ 7 levels " ?"," Federal-gov",..: 6 4 4 4 4 4 6 4 4 4 ...
 $ fnlwgt        : int  83311 215646 234721 338409 284582 160187 209642 45781 159449 280464 ...
 $ education     : Factor w/ 13 levels " 10th"," 11th",..: 8 10 2 8 11 5 10 11 8 13 ...
 $ education_num : int  13 9 7 13 14 5 9 14 13 10 ...
 $ marital_status: Factor w/ 6 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 5 3 3 ...
 $ occupation    : Factor w/ 13 levels " ?"," Adm-clerical",..: 4 6 6 9 4 8 4 9 4 4 ...
 $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 2 1 1 ...
 $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 3 ...
 $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
 $ capital_gain  : int  0 0 0 0 0 0 0 14084 5178 0 ...
 $ capital_loss  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ hours_per_week: int  13 40 40 40 40 16 45 50 40 80 ...
 $ native_country: Factor w/ 10 levels " ?"," Cuba"," England",..: 10 10 10 2 10 6 10 10 10 10 ...
 $ income        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...

Upvotes: 2

Related Questions