Reputation: 77
Here is some mock data corresponding to the real dataset I am using:
a <- c("a","b","c","d","e","f","g","h","i","j")
b <- 1:10
names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
data <- data.frame(a,b,names)
data <- data %>%
mutate(gender = NA)
I want to assign a "gender" value to the names
variable in my dataset. I don't want to do this manually because I am dealing with 1000s of observations. I do however have these variables, which contain the "names" value corresponding to the right gender:
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
However I don't know how to use them to assign a "gender" value to correspond with specific "names" in my dataset.
Here is what I tried:
data$gender[data$names== male] <- "Male"
And:
data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male"
This code does not assign "Male" to all of the values. I recieve a warning message:
"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
longer object length is not a multiple of shorter object length"
Does anyone know how I can assign values to my gender
variable corresponding to the names
variable?
Upvotes: 2
Views: 2472
Reputation: 21908
You can also use the following solution:
library(dplyr)
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
data %>%
mutate(gender = case_when(
names %in% male ~ "Male",
names %in% female ~ "Female",
names %in% noanswer ~ "Noanswer"
))
a b names gender
1 a 1 Alex Male
2 b 2 Ale Male
3 c 3 Alexandra Female
4 d 4 Alexander Male
5 e 5 Ali Female
6 f 6 Amanda Female
7 g 7 Alix Noanswer
8 h 8 Ajax Noanswer
9 i 9 Aley Noanswer
10 j 10 Ajay Noanswer
Upvotes: 2
Reputation: 887048
We can create a named list
and then stack
it to a two column dataset, which we use in a join
new <- stack(list(male = male, female = female, noanswer = noanswer))
names(new) <- c("names", "gender")
data <- data %>%
left_join(new, by = "names")
-output
data
a b names gender
1 a 1 Alex male
2 b 2 Ale male
3 c 3 Alexandra female
4 d 4 Alexander male
5 e 5 Ali female
6 f 6 Amanda female
7 g 7 Alix noanswer
8 h 8 Ajax noanswer
9 i 9 Aley noanswer
10 j 10 Ajay noanswer
Regarding the OP's warning
, it is just that ==
is elementwise comparison and that is applicable mostly when the length
of 1 of the datasets is either 1 (which gets recycled) or be the same length
as the other one. Here, the length
s are different. So, it gets recycled and as it is not a multiple of the other vector length, there is warning. But, sometimes we don't get warning, but still it is incorrect because what it does is similar to the one below. If the second vector is of length 3 and first is 5
v1[1] == v2[1]
v1[2] == v2[2]
v1[3] == v2[3]
v1[4] == v2[1]
...
Instead, we may use %in%
data$gender[data$names %in% male] <- "Male"
data$gender[data$names %in% female] <- "Female"
data$gender[data$names %in% noanswer] <- "noanswer"
data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander",
"Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
class = "data.frame", row.names = c(NA,
-10L))
Upvotes: 2