Reputation: 29
I want to change all the gender entries 'Male, Female, Woman, Man, man etc.' to be more consistent so its only 3 elements (Male, Female and Non-Binary). This is my current code
# Cleaning of Specific Variable Types
removed <- removed %>%
mutate(gender=substr(toupper(gender), 1, 1))
removed <- removed %>%
mutate(gender=case_when(
gender == "M"~"Male",
gender == "F"~"Female",
gender == "N"~"Non-binary")
)
Upvotes: 2
Views: 551
Reputation: 76450
The problem seems to be with the default value of gender
. Use TRUE
, instead of matching it with "N"
.
Tested with the data in jay.sf's answer.
library(dplyr)
removed %>%
mutate(
gender = toupper(substr(gender, 1, 1)),
gender = case_when(
gender == "M" ~ "Male",
gender %in% c("F", "W") ~ "Female",
TRUE ~ "Non-binary"
))
Upvotes: 2
Reputation: 78937
This maybe the long version, but it should work: Data from jay.sf (many thanks)
gender
case_when
condition with str_detect
and pattern:# Capitalize each value to avoid interaction of "man" and "woman" in str_detect
# check for unique elements in `gender`
removed$gender <- str_to_title(removed$gender)
unique(removed$gender)
[1] "Male" "Woman" "Other" "Mtf" "Female"
[6] "Man" "Ftm" "Androgyne"
# define pattern for each category
Male <- paste(c("Male", "Man"), collapse = "|")
Female <- paste(c("Woman", "Female"), collapse = "|")
Non_binary <- paste(c("Other", "Mtf", "Ftm", "Androgyne"), collapse= "|")
# apply category with `case_when` and pattern:
library(dplyr)
library(stringr)
removed %>%
mutate(gender = case_when(
str_detect(gender, Male) ~ "Male",
str_detect(gender, Female) ~ "Female",
str_detect(gender, Non_binary) ~ "Non-binary"))
Output:
gender
1 Male
2 Female
3 Male
4 Non-binary
5 Non-binary
6 Female
7 Male
8 Non-binary
9 Male
10 Male
11 Male
12 Female
13 Non-binary
14 Female
15 Female
16 Non-binary
17 Male
18 Female
19 Non-binary
20 Non-binary
21 Non-binary
22 Female
23 Female
24 Female
25 Female
26 Male
27 Non-binary
28 Male
29 Female
30 Non-binary
Upvotes: 2
Reputation: 72984
You probably have a data frame like this.
removed
# gender
# 1 Male
# 2 Woman
# 3 Male
# 4 other
# 5 MtF
# 6 female
# ...
You could now create a key table in a half-automated way like so.
key <- data.frame(x=sort(unique(tolower(removed$gender))),
y=factor(c(3, 1, 3, 2, 2, 3, 3, 1),
labels=c('female', 'male', 'non-binary')))
Then use match
to replace the labels.
library(dplyr)
removed %>%
mutate(gender=key$y[match(tolower(gender), key$x)])
# gender
# 1 male
# 2 female
# 3 male
# 4 non-binary
# 5 non-binary
# 6 female
# 7 ...
Data
removed <- structure(list(gender = c("Male", "Woman", "Male", "other", "MtF",
"female", "male", "MtF", "Male", "man", "Man", "female", "other",
"Woman", "female", "MtF", "male", "Female", "other", "other",
"FtM", "female", "Woman", "Woman", "female", "male", "androgyne",
"man", "Female", "MtF")), class = "data.frame", row.names = c(NA,
-30L))
Upvotes: 1