Reputation: 471
Suppose I have a column in a data frame as colors say c("Red", "Blue", "Blue", "Orange")
.
I would like to get it as c(1,2,2,3)
.
Red as 1
Blue as 2
Orange as 3
Is there a simpler way of doing this other than the obvious if/else or switch functions?
Upvotes: 5
Views: 11309
Reputation: 3043
Here is a function based on previous code:
# Recode 'string' into 'integer'
recode_str_int <- function(df, feature) {
# 1. Unique values
# 1.1. 'string' values
list_str <- sort(unique(df[, feature]))
# 1.2. 'integer' values
list_int <- 1:length(list_str)
# 2. Create new feature
# 2.1. Names
names(list_int) = list_str
df$feature_new = list_int[df[, feature]]
# 3. Result
df$feature_new
} # recode_str_int
Call it like:
df$new_feature <- recode_str_int(df, "feature")
Upvotes: 0
Reputation: 56179
Using car::recode() function:
library(car)
recode(x, "'Red'=1; 'Blue'=2; 'Orange'=3;")
# [1] 1 2 2 3
Upvotes: 1
Reputation: 2589
Set up a named vector, describing the link between colour and integers (i.e. specifically how the strings map to the integers):
colors=c(1,2,3)
names(colors)=c("Red", "Blue", "Orange")
Now use the named vector to generate a list of numbers associated with the colours in your data frame:
>colors[c("Red","Blue","Blue","Orange")]
Red Blue Blue Orange
1 2 2 3
UPDATE to address questions below. Here's an example of what I think you're trying to do.
dataframe=data.frame(Gender=c("F","F","M","F","F","M"))
strings=sort(unique(dataframe$Gender))
colors=1:length(strings)
names(colors)=strings
dataframe$Colours=colors[dataframe$Gender]
Can have a look at the result:
> dataframe
Gender Colours
1 F 1
2 F 1
3 M 2
4 F 1
5 F 1
6 M 2
Note that this example assumes that you have no specific mapping between Gender and Colours in mind. If this is really the case, then it might be simpler to just follow the comment from @alexis_laz instead.
Upvotes: 13
Reputation: 7664
I must be missing something, but this method would work I believe. Having coerced your column with words (below, "names") to a factor, you revalue
them by your numbers in "colors".
require(plyr)
colors <- c("1","2","3")
names <- c("Red", "Blue", "Orange")
df <- data.frame(names, colors)
df$names <- as.factor(df$names)
df$names <- revalue(x = df$names, c("Red" = 1, "Blue" = 2, "Orange" = 3))
Upvotes: 4