Jeffery Chen
Jeffery Chen

Reputation: 323

R:How to replace string to integer?

I have a dataset looks like:

classification  Interest    Age     Gender
Card battle     IL029       18-24   male
Card battle     IL001       45-54   male
Card battle     IL001       18-24   male
Card battle     IL001       35-44   male
Card battle     IL001       35-44   male
Card battle     IL013       35-44   male

How to replace "18-24" to 20,"35-44" to 40 and "45-54" to 50 in the age column?

Upvotes: 2

Views: 16652

Answers (4)

MichaelChirico
MichaelChirico

Reputation: 34763

A data.table solution is to merge (much easier to extend to more complicated cases):

library(data.table)
#your data
DT = data.table(
  classification = "Card battle",
  Interest = sprintf('IL%03d', c(29, 1, 1, 1, 1, 13)),
  Age = c("18-24","45-54","18-24", rep("35-44", 3L)),
  Gender = "male"
)

#conversion table
convert = data.table(
  Age_range = c("18-24", "45-54", "35-44"),
  #need to keep as string here since 
  #  the target column to overwrite is character
  Age_middle = paste0(c(20, 40, 50))
)

#replace Age, then set its class
DT[convert, on = c(Age = 'Age_range'), Age := i.Age_middle]
#  now convert back to numeric
DT[ , Age := as.numeric(Age)]

You might consider keeping the range column around, and simply adding a rounded age column, which would make for cleaner code:

convert = data.table(
  Age_range = c("18-24","45-54","35-44"),
  Age_middle = c(20L,40L,50L)
)

DT[convert, Age_middle := i.Age_middle]
DT
#    classification Interest   Age Gender age_rounded
# 1:    Card battle    IL029 18-24   male          20
# 2:    Card battle    IL001 18-24   male          20
# 3:    Card battle    IL001 35-44   male          50
# 4:    Card battle    IL001 35-44   male          50
# 5:    Card battle    IL013 35-44   male          50
# 6:    Card battle    IL001 45-54   male          40

Upvotes: 2

Cath
Cath

Reputation: 24074

Another way, using regex, capturing the second to last digit and putting a 0 after:

DF$Age <- as.numeric(sub(".*(\\d)\\d$", "\\10", as.character(DF$Age)))

(or simply as.numeric(sub(".*(\\d)\\d$", "\\10", DF$Age)) if Age is not a factor)

DF
#  classification Interest Age Gender
#1    Card battle    IL029  20   male
#2    Card battle    IL001  50   male
#3    Card battle    IL001  20   male
#4    Card battle    IL001  40   male
#5    Card battle    IL001  40   male
#6    Card battle    IL013  40   male

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270348

This will replace Age with a factor having labels 20, 40 and 50:

transform(DF, Age = factor(Age, 
       levels = c("18-24", "35-44", "45-54"),
       labels = c(20, 40, 50)))

giving:

  classification Interest Age Gender
1    Card battle    IL029  20   male
2    Card battle    IL001  50   male
3    Card battle    IL001  20   male
4    Card battle    IL001  40   male
5    Card battle    IL001  40   male
6    Card battle    IL013  40   male

Actually it can likely be reduced to this although the above is a bit safer:

transform(DF, Age = factor(Age, labels = c(20, 40, 50)))

If you prefer an integer column then:

transform(DF, Age = as.integer(as.character(
       factor(Age, 
         levels = c("18-24", "35-44", "45-54"),
         labels = c(20, 40, 50)
       )
 )))

and, again, we could likely omit the levels argument:

transform(DF, Age = as.integer(as.character(factor(Age, labels = c(20, 40, 50)))))

Note: We used this as input:

DF <-
structure(list(classification = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = "Card battle", class = "factor"), Interest = structure(c(3L, 
1L, 1L, 1L, 1L, 2L), .Label = c("IL001", "IL013", "IL029"), class = "factor"), 
    Age = structure(c(1L, 3L, 1L, 2L, 2L, 2L), .Label = c("18-24", 
    "35-44", "45-54"), class = "factor"), Gender = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "male", class = "factor")), .Names = c("classification", 
"Interest", "Age", "Gender"), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 4

Fedorenko Kristina
Fedorenko Kristina

Reputation: 2777

Try something like this

data$age <- as.character(data$age)
data$age[which(data$age=="18-24")] <- "20"
data$age[which(data$age=="35-44")] <- "40"
data$age[which(data$age=="45-54")] <- "50"
data$age <- as.numeric(data$age)

Upvotes: 5

Related Questions