freakyhat
freakyhat

Reputation: 471

Map array of strings to an array of integers

Suppose I have a column in a data frame as colors say c("Red", "Blue", "Blue", "Orange"). I would like to get it as c(1,2,2,3).

Red as 1
Blue as 2
Orange as 3

Is there a simpler way of doing this other than the obvious if/else or switch functions?

Upvotes: 5

Views: 11309

Answers (4)

Andrii
Andrii

Reputation: 3043

Here is a function based on previous code:

# Recode 'string' into 'integer'
recode_str_int <- function(df, feature) {

  # 1. Unique values

  # 1.1. 'string' values
  list_str <- sort(unique(df[, feature]))

  # 1.2. 'integer' values
  list_int <- 1:length(list_str)

  # 2. Create new feature

  # 2.1. Names
  names(list_int) = list_str
  df$feature_new = list_int[df[, feature]]

  # 3. Result
  df$feature_new

} # recode_str_int

Call it like:

 df$new_feature <- recode_str_int(df, "feature")

Upvotes: 0

zx8754
zx8754

Reputation: 56179

Using car::recode() function:

library(car)

recode(x, "'Red'=1; 'Blue'=2; 'Orange'=3;")
# [1] 1 2 2 3

Upvotes: 1

CnrL
CnrL

Reputation: 2589

Set up a named vector, describing the link between colour and integers (i.e. specifically how the strings map to the integers):

colors=c(1,2,3)
names(colors)=c("Red", "Blue", "Orange")

Now use the named vector to generate a list of numbers associated with the colours in your data frame:

>colors[c("Red","Blue","Blue","Orange")]
   Red   Blue   Blue Orange 
     1      2      2      3 

UPDATE to address questions below. Here's an example of what I think you're trying to do.

dataframe=data.frame(Gender=c("F","F","M","F","F","M"))
strings=sort(unique(dataframe$Gender))
colors=1:length(strings)
names(colors)=strings
dataframe$Colours=colors[dataframe$Gender]

Can have a look at the result:

> dataframe
  Gender Colours
1      F      1
2      F      1
3      M      2
4      F      1
5      F      1
6      M      2

Note that this example assumes that you have no specific mapping between Gender and Colours in mind. If this is really the case, then it might be simpler to just follow the comment from @alexis_laz instead.

Upvotes: 13

lawyeR
lawyeR

Reputation: 7664

I must be missing something, but this method would work I believe. Having coerced your column with words (below, "names") to a factor, you revalue them by your numbers in "colors".

require(plyr)

colors <- c("1","2","3")
names <- c("Red", "Blue", "Orange")
df <- data.frame(names, colors)
df$names <- as.factor(df$names)
df$names <- revalue(x = df$names, c("Red" = 1, "Blue" = 2, "Orange" = 3))

Upvotes: 4

Related Questions