Using case when to evaluate variables across columns and assign buckets

Question

I have a dataset with race/ethnicity variables spread across several columns - individuals are able to select multiple categories, but I want to group folks if they are within a certain category.

For example if someone selected two categories, south east asian, and south asian, the new category would label that person as 'asian.' Likewise, if they chose cuban and mexican, I would label them 'hispanic/latino.' However, if someone chose south east asian and cuban, I would like to label them as multiracial. I have 20+ individual choices someone can make and want to combine them innto the larger categories asian, black, white, hispanic/latino, etc.

data <- data.frame(race_se_asian = c(0,1,1,0,0,0,1,0,0,0,0),
                   race_south_asian = c(0,l,0,0,1,0,1,0,0,0),
                   race_european = c(1,0,1,0,0,0,0,0,0,0),
                   race_cuban = c(1,0,1,0,0,0,0,0,0,0),
                   race_mexican = c(1,0,1,0,0,0,0,0,0,0))

I've also created a 'total categories' column that counts the number of selections each individual has made.

Is there an easy way to do this with case_when or a for loop?

Rui Barradas · Accepted Answer

Here is a way with an auxiliary function.

library(dplyr)

fun <- function(x, groups){
  i <- which(x == 1)
  if(length(i)){
    g <- groups[i, "groups"]
    g <- unique(g)
    if(length(g) == 1) g else "multiracial"
  } else NA_character_
}

groups <- c("asian", "asian", "european", "hispanic/latino", "hispanic/latino")
groups <- data.frame(ethnicity = names(data), groups)

data %>%
  rowwise() %>%
  mutate(category = fun(c_across(everything()), groups))

Using case when to evaluate variables across columns and assign buckets

Answers (1)

Related Questions