syd321
syd321

Reputation: 29

How do i categorise all the different entries into Male, Female and Non-Binary

I want to change all the gender entries 'Male, Female, Woman, Man, man etc.' to be more consistent so its only 3 elements (Male, Female and Non-Binary). This is my current code

# Cleaning of Specific Variable Types
removed <- removed %>%
  mutate(gender=substr(toupper(gender), 1, 1))

removed <- removed %>% 
  mutate(gender=case_when(
    gender == "M"~"Male",
    gender == "F"~"Female",
    gender == "N"~"Non-binary")
  )

Upvotes: 2

Views: 551

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76450

The problem seems to be with the default value of gender. Use TRUE, instead of matching it with "N".
Tested with the data in jay.sf's answer.

library(dplyr)

removed %>%
  mutate(
    gender = toupper(substr(gender, 1, 1)),
    gender = case_when(
      gender == "M" ~ "Male",
      gender %in% c("F", "W") ~ "Female",
      TRUE ~ "Non-binary"
  ))

Upvotes: 2

TarJae
TarJae

Reputation: 78937

This maybe the long version, but it should work: Data from jay.sf (many thanks)

  1. Capitalize first letter
  2. check for unique entries in gender
  3. create pattern for each category
  4. apply case_when condition with str_detect and pattern:
# Capitalize each value to avoid interaction of "man" and "woman" in str_detect
# check for unique elements in `gender`
removed$gender <- str_to_title(removed$gender)
unique(removed$gender)  
[1] "Male"      "Woman"     "Other"     "Mtf"       "Female"   
[6] "Man"       "Ftm"       "Androgyne"

# define pattern for each category
Male <- paste(c("Male", "Man"), collapse = "|")
Female <- paste(c("Woman", "Female"), collapse = "|")
Non_binary <- paste(c("Other", "Mtf", "Ftm", "Androgyne"), collapse= "|")

# apply category with `case_when` and pattern:
library(dplyr)
library(stringr)
removed %>% 
    mutate(gender = case_when(
        str_detect(gender, Male) ~ "Male",
        str_detect(gender, Female) ~ "Female",
        str_detect(gender, Non_binary) ~ "Non-binary"))

Output:

gender
1        Male
2      Female
3        Male
4  Non-binary
5  Non-binary
6      Female
7        Male
8  Non-binary
9        Male
10       Male
11       Male
12     Female
13 Non-binary
14     Female
15     Female
16 Non-binary
17       Male
18     Female
19 Non-binary
20 Non-binary
21 Non-binary
22     Female
23     Female
24     Female
25     Female
26       Male
27 Non-binary
28       Male
29     Female
30 Non-binary

Upvotes: 2

jay.sf
jay.sf

Reputation: 72984

You probably have a data frame like this.

removed
#   gender
# 1   Male
# 2  Woman
# 3   Male
# 4  other
# 5    MtF
# 6 female
# ...

You could now create a key table in a half-automated way like so.

key <- data.frame(x=sort(unique(tolower(removed$gender))),
                  y=factor(c(3, 1, 3, 2, 2, 3, 3, 1), 
                           labels=c('female', 'male', 'non-binary')))

Then use match to replace the labels.

library(dplyr)
removed %>% 
  mutate(gender=key$y[match(tolower(gender), key$x)])
#        gender
# 1        male
# 2      female
# 3        male
# 4  non-binary
# 5  non-binary
# 6      female
# 7         ...

Data

removed <- structure(list(gender = c("Male", "Woman", "Male", "other", "MtF", 
"female", "male", "MtF", "Male", "man", "Man", "female", "other", 
"Woman", "female", "MtF", "male", "Female", "other", "other", 
"FtM", "female", "Woman", "Woman", "female", "male", "androgyne", 
"man", "Female", "MtF")), class = "data.frame", row.names = c(NA, 
-30L))

Upvotes: 1

Related Questions