Reputation: 40899
Suppose I've read in a data frame, where a column contains strings as factors. I would like to convert the factors to numerics but with specific mappings. This conversion is typically a precursor step for a later calculation. For example:
> library(rpart)
> head(car90["Type"])
Type
Acura Integra Small
Acura Legend Medium
Audi 100 Medium
Audi 80 Compact
BMW 325i Compact
BMW 535i Medium
> summary(car90$Type)
Compact Large Medium Small Sporty Van NA's
19 7 26 22 21 10 6
In the car90$Type column, I would like to set 'Compact' to be -10, 'Large' to be -1, 'Medium' to be 0, 'Small' to be 1, 'Sporty' to be 10, and 'Van' to be 20, where the numbers are numerics, not factors. How would I do that?
I have already looked at related questions, but none provided a solution.
Replace specific column "words" into number or blank
Changing column names of a data frame in R
Replace contents of factor column in R dataframe
Upvotes: 3
Views: 484
Reputation: 52637
Just reset the levels:
levels(car90$Type) <- c(-10, -1, 0, 1, 10, 20)
Leads to (same head/subset as you):
# Type
# Acura Integra 1
# Acura Legend 0
# Audi 100 0
# Audi 80 -10
# BMW 325i -10
# BMW 535i 0
Though beware, if you intend to compute on this, you must then as.numeric(levels(fac))[fac]
to make sure you compute on the numbers, not the underlying factor integer values.
Upvotes: 0
Reputation: 93813
As @NealFultz notes, vector subscripting can achieve this. One must be careful though with how you do this operation though:
x <- car90$Type[1:10]
#[1] Small Medium Medium Compact Compact Medium Medium Large Large <NA>
#Levels: Compact Large Medium Small Sporty Van
I.e.:
vals <- c(Compact=-10,Large=-1,Medium=0,Small=1,Sporty=10,Van=20)
vals[x]
Will give the correct result as the order in vals
is the same as the levels
in the factor x
:
vals[x]
# Small Medium Medium Compact Compact Medium Medium Large Large <NA>
# 1 0 0 -10 -10 0 0 -1 -1 NA
This will fall over if you change the order in vals
, e.g.:
vals <- c(Large=-1,Compact=-10,Medium=0,Small=1,Sporty=10,Van=20)
vals[x]
# Small Medium Medium Large Large Medium Medium Compact Compact <NA>
# 1 0 0 -1 -1 0 0 -10 -10 NA
You can get around this by subsetting based on comparing the character representation in x
to the names
of vals
rather than the order, like:
vals <- c(Large=-1,Compact=-10,Medium=0,Small=1,Sporty=10,Van=20)
vals[as.character(x)]
# Small Medium Medium Compact Compact Medium Medium Large Large <NA>
# 1 0 0 -10 -10 0 0 -1 -1 NA
Upvotes: 1
Reputation: 36
Use merge() as in the following example.
First create a data frame with the values you want. In this scenario you would write
dictionary <- data.frame(Type = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'),
Values = c(-10, -1, 0, 1, 10, 20))
output <- merge(car90$Type, dictionary)
IMPORTANT: This example doesn't take NA into account. If you want to give those a value as well you'll need to include that as a type with its own value. Otherwise those rows won't be part of the output.
And the resulting data frame is formatted as you want it.
NOTE: It's easier if the columns are named exactly the same, but you can define the columns to be used with by.x and by.y check the documentation for more.
Upvotes: 0
Reputation: 16090
This is a join operation
encode <- data.frame(Type = c("Compact", "Large", "Medium", "Small", "Sporty", "Van"), TypeValue = c(-10,-1,0,1,10,20))
car90 <- merge(car90, encode, all.x = TRUE)
# or using dplyr
library(dplyr)
car90 <- left_join(car90, encode)
Upvotes: 0
Reputation: 9816
you can try this
x <- c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van')
y <- factor(x, levels = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'),
labels = c(-10, -1, 0, 1, 10, 20))
as.numeric(as.character(y))
[1] -10 -1 0 1 10 20
For your case, you can call:
car90$Type <- factor(car90$Type, levels = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'),
labels = c(-10, -1, 0, 1, 10, 20))
car90$Type <- as.numeric(as.character(car90$Type))
Upvotes: 1
Reputation: 9687
I would just use vector subscripting; here's an example:
R>a <- as.factor(c("C", "L", "M", "L", "C"))
R>a
[1] C L M L C
Levels: C L M
R>b <- c(C=-10,L=-1,M=0)
R>b
C L M
-10 -1 0
R>
R>b[a]
C L M L C
-10 -1 0 -1 -10
R>
Upvotes: 1