Reputation: 323
I have a dataset looks like:
classification Interest Age Gender
Card battle IL029 18-24 male
Card battle IL001 45-54 male
Card battle IL001 18-24 male
Card battle IL001 35-44 male
Card battle IL001 35-44 male
Card battle IL013 35-44 male
How to replace "18-24" to 20,"35-44" to 40 and "45-54" to 50 in the age column?
Upvotes: 2
Views: 16652
Reputation: 34763
A data.table
solution is to merge (much easier to extend to more complicated cases):
library(data.table)
#your data
DT = data.table(
classification = "Card battle",
Interest = sprintf('IL%03d', c(29, 1, 1, 1, 1, 13)),
Age = c("18-24","45-54","18-24", rep("35-44", 3L)),
Gender = "male"
)
#conversion table
convert = data.table(
Age_range = c("18-24", "45-54", "35-44"),
#need to keep as string here since
# the target column to overwrite is character
Age_middle = paste0(c(20, 40, 50))
)
#replace Age, then set its class
DT[convert, on = c(Age = 'Age_range'), Age := i.Age_middle]
# now convert back to numeric
DT[ , Age := as.numeric(Age)]
You might consider keeping the range column around, and simply adding a rounded age column, which would make for cleaner code:
convert = data.table(
Age_range = c("18-24","45-54","35-44"),
Age_middle = c(20L,40L,50L)
)
DT[convert, Age_middle := i.Age_middle]
DT
# classification Interest Age Gender age_rounded
# 1: Card battle IL029 18-24 male 20
# 2: Card battle IL001 18-24 male 20
# 3: Card battle IL001 35-44 male 50
# 4: Card battle IL001 35-44 male 50
# 5: Card battle IL013 35-44 male 50
# 6: Card battle IL001 45-54 male 40
Upvotes: 2
Reputation: 24074
Another way, using regex
, capturing the second to last digit and putting a 0 after:
DF$Age <- as.numeric(sub(".*(\\d)\\d$", "\\10", as.character(DF$Age)))
(or simply as.numeric(sub(".*(\\d)\\d$", "\\10", DF$Age))
if Age
is not a factor)
DF
# classification Interest Age Gender
#1 Card battle IL029 20 male
#2 Card battle IL001 50 male
#3 Card battle IL001 20 male
#4 Card battle IL001 40 male
#5 Card battle IL001 40 male
#6 Card battle IL013 40 male
Upvotes: 0
Reputation: 270348
This will replace Age
with a factor having labels 20, 40 and 50:
transform(DF, Age = factor(Age,
levels = c("18-24", "35-44", "45-54"),
labels = c(20, 40, 50)))
giving:
classification Interest Age Gender
1 Card battle IL029 20 male
2 Card battle IL001 50 male
3 Card battle IL001 20 male
4 Card battle IL001 40 male
5 Card battle IL001 40 male
6 Card battle IL013 40 male
Actually it can likely be reduced to this although the above is a bit safer:
transform(DF, Age = factor(Age, labels = c(20, 40, 50)))
If you prefer an integer column then:
transform(DF, Age = as.integer(as.character(
factor(Age,
levels = c("18-24", "35-44", "45-54"),
labels = c(20, 40, 50)
)
)))
and, again, we could likely omit the levels
argument:
transform(DF, Age = as.integer(as.character(factor(Age, labels = c(20, 40, 50)))))
Note: We used this as input:
DF <-
structure(list(classification = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = "Card battle", class = "factor"), Interest = structure(c(3L,
1L, 1L, 1L, 1L, 2L), .Label = c("IL001", "IL013", "IL029"), class = "factor"),
Age = structure(c(1L, 3L, 1L, 2L, 2L, 2L), .Label = c("18-24",
"35-44", "45-54"), class = "factor"), Gender = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "male", class = "factor")), .Names = c("classification",
"Interest", "Age", "Gender"), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 4
Reputation: 2777
Try something like this
data$age <- as.character(data$age)
data$age[which(data$age=="18-24")] <- "20"
data$age[which(data$age=="35-44")] <- "40"
data$age[which(data$age=="45-54")] <- "50"
data$age <- as.numeric(data$age)
Upvotes: 5