Reputation: 45
Here I have a dataset with a column name as Age = (24 or under, 25 to 34, 35 to 44, 45 to 54, 25 to 34, 24 or under,35 to 44, 25 to 34, 45 to 54)
Now I need to convert the values for the categorical variable "Age" as follows: 24 or under equal to 1, 25 to 34 equal to 2, 35 to 44 equal to 3, 45 to 54 equal to 4
Can anyone help me here?
Many thanks in advance.
Upvotes: 0
Views: 3801
Reputation: 1868
As pieterbons described, your Age field is practically a factor already. If you convert Age to type numeric, you'll have your data in numeric categories.
df <- data.frame(Age = c("24 or under", "25 to 34", "35 to 44", "45 to 54"))
df$Age <- as.numeric(df$Age)
You can also create a new field with dummy codes of your Age field as you described (this option would be particularly helpful if you had a string variable that you wanted to convert to a factor but it had a very distinct order), there are multiple ways to do this:
# 1) Base R
df$age_new <- as.numeric(df$Age)
# 2) dplyr
library(dplyr)
df <- df %>%
mutate(Age = case_when(Age == "24 or under" ~ 1,
Age == "25 to 34" ~ 2,
Age == "35 to 44" ~ 3,
TRUE ~ 4))
#> df
# Age age_new
#1 24 or under 1
#2 25 to 34 2
#3 35 to 44 3
#4 45 to 54 4
Upvotes: 1
Reputation: 1
If you want a dummy variable (ie 0 or 1) you can use a dplyr::if_else
statement to create a new variable for each category:
library(dplyr)
Age = c("24 or under", "25 to 34", "35 to 44", "45 to 54")
data.frame(age = Age) %>%
mutate("24 or under" = if_else(age == Age[1], 1, 0),
"25 to 34" = if_else(age == Age[2], 1, 0),
"35 to 44" = if_else(age == Age[3], 1, 0),
"45 to 54" = if_else(age == Age[4], 1, 0))
If you want a numeric value instead, code your variable as a factor
, set the levels in the order you want, and then use as.numeric
:
Age = factor(c("24 or under", "25 to 34", "35 to 44", "45 to 54"),
levels = c(c("24 or under", "25 to 34", "35 to 44", "45 to 54")))
as.numeric(Age)
Upvotes: 0
Reputation: 21400
You can use nested ifelse
statements:
set.seed(12)
df <- data.frame(Age = c(sample(c("24 or under", "25 to 34", "35 to 44", "45 to 54"), 20, replace = T)))
df$Age_new <- ifelse(df$Age == "24 or under", 1,
ifelse(df$Age == "25 to 34", 2,
ifelse(df$Age == "35 to 44", 3, 4)))
Result:
df
Age Age_new
1 25 to 34 2
2 35 to 44 3
3 24 or under 1
4 45 to 54 4
5 24 or under 1
6 35 to 44 3
7 45 to 54 4
8 25 to 34 2
9 45 to 54 4
10 35 to 44 3
11 24 or under 1
12 35 to 44 3
13 25 to 34 2
14 24 or under 1
15 25 to 34 2
16 35 to 44 3
17 25 to 34 2
18 25 to 34 2
19 35 to 44 3
20 25 to 34 2
Upvotes: 1
Reputation: 1724
If your column Age is a factor, this actually automatically happens behind the screen (each value is stored as an integer and has a corresponding text label). To explicitly get these integers, you can use as.numeric()
.
df <- data.frame(Age = c("24 or under", "25 to 34", "35 to 44", "45 to 54"))
df$Age_cat <- as.numeric(df$Age)
You might run into sorting issues if the levels should have a different order than the original one. In that case you can explicitly set the levels of the factor.
Upvotes: 0