stackoverflowuser2010
stackoverflowuser2010

Reputation: 40899

R - changing factors to numerics with specific mappings

Suppose I've read in a data frame, where a column contains strings as factors. I would like to convert the factors to numerics but with specific mappings. This conversion is typically a precursor step for a later calculation. For example:

> library(rpart)

> head(car90["Type"])
                 Type
Acura Integra   Small
Acura Legend   Medium
Audi 100       Medium
Audi 80       Compact
BMW 325i      Compact
BMW 535i       Medium

> summary(car90$Type)
Compact   Large  Medium   Small  Sporty     Van    NA's 
     19       7      26      22      21      10       6

In the car90$Type column, I would like to set 'Compact' to be -10, 'Large' to be -1, 'Medium' to be 0, 'Small' to be 1, 'Sporty' to be 10, and 'Van' to be 20, where the numbers are numerics, not factors. How would I do that?

I have already looked at related questions, but none provided a solution.

Replace specific column "words" into number or blank

Changing column names of a data frame in R

Replace contents of factor column in R dataframe

Convert factor to integer

Upvotes: 3

Views: 484

Answers (6)

BrodieG
BrodieG

Reputation: 52637

Just reset the levels:

levels(car90$Type) <- c(-10, -1, 0, 1, 10, 20)

Leads to (same head/subset as you):

#               Type
# Acura Integra    1
# Acura Legend     0
# Audi 100         0
# Audi 80        -10
# BMW 325i       -10
# BMW 535i         0

Though beware, if you intend to compute on this, you must then as.numeric(levels(fac))[fac] to make sure you compute on the numbers, not the underlying factor integer values.

Upvotes: 0

thelatemail
thelatemail

Reputation: 93813

As @NealFultz notes, vector subscripting can achieve this. One must be careful though with how you do this operation though:

x <- car90$Type[1:10]
#[1] Small   Medium  Medium  Compact Compact Medium  Medium  Large   Large   <NA>
#Levels: Compact Large Medium Small Sporty Van

I.e.:

vals <- c(Compact=-10,Large=-1,Medium=0,Small=1,Sporty=10,Van=20)
vals[x]

Will give the correct result as the order in vals is the same as the levels in the factor x:

vals[x]
#  Small  Medium  Medium Compact Compact  Medium  Medium   Large   Large    <NA> 
#      1       0       0     -10     -10       0       0      -1      -1      NA 

This will fall over if you change the order in vals, e.g.:

vals <- c(Large=-1,Compact=-10,Medium=0,Small=1,Sporty=10,Van=20)
vals[x]
#  Small  Medium  Medium   Large   Large  Medium  Medium Compact Compact    <NA> 
#      1       0       0      -1      -1       0       0     -10     -10      NA 

You can get around this by subsetting based on comparing the character representation in x to the names of vals rather than the order, like:

vals <- c(Large=-1,Compact=-10,Medium=0,Small=1,Sporty=10,Van=20)
vals[as.character(x)]
#  Small  Medium  Medium Compact Compact  Medium  Medium   Large   Large    <NA> 
#      1       0       0     -10     -10       0       0      -1      -1      NA 

Upvotes: 1

Said Montiel
Said Montiel

Reputation: 36

Use merge() as in the following example.

First create a data frame with the values you want. In this scenario you would write

 dictionary <- data.frame(Type = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'),
                     Values = c(-10, -1, 0, 1, 10, 20))

 output <- merge(car90$Type, dictionary)

IMPORTANT: This example doesn't take NA into account. If you want to give those a value as well you'll need to include that as a type with its own value. Otherwise those rows won't be part of the output.

And the resulting data frame is formatted as you want it.

NOTE: It's easier if the columns are named exactly the same, but you can define the columns to be used with by.x and by.y check the documentation for more.

Upvotes: 0

Hugh
Hugh

Reputation: 16090

This is a join operation

encode <- data.frame(Type = c("Compact", "Large", "Medium", "Small", "Sporty", "Van"), TypeValue = c(-10,-1,0,1,10,20))

car90 <- merge(car90, encode, all.x = TRUE)

# or using dplyr
library(dplyr)
car90 <- left_join(car90, encode)

Upvotes: 0

Bangyou
Bangyou

Reputation: 9816

you can try this

x <- c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van') 
y <-  factor(x, levels = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'), 
    labels = c(-10, -1, 0, 1, 10, 20))
as.numeric(as.character(y))


[1] -10  -1   0   1  10  20

For your case, you can call:

car90$Type <-  factor(car90$Type, levels = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'), 
    labels = c(-10, -1, 0, 1, 10, 20))
car90$Type <-  as.numeric(as.character(car90$Type))

Upvotes: 1

Neal Fultz
Neal Fultz

Reputation: 9687

I would just use vector subscripting; here's an example:

R>a <- as.factor(c("C", "L", "M", "L", "C"))
R>a
[1] C L M L C
Levels: C L M
R>b <- c(C=-10,L=-1,M=0)
R>b
  C   L   M 
-10  -1   0 
R>
R>b[a]
  C   L   M   L   C 
-10  -1   0  -1 -10 
R>

Upvotes: 1

Related Questions