alexvc
alexvc

Reputation: 67

Using case when to evaluate variables across columns and assign buckets

I have a dataset with race/ethnicity variables spread across several columns - individuals are able to select multiple categories, but I want to group folks if they are within a certain category.

For example if someone selected two categories, south east asian, and south asian, the new category would label that person as 'asian.' Likewise, if they chose cuban and mexican, I would label them 'hispanic/latino.' However, if someone chose south east asian and cuban, I would like to label them as multiracial. I have 20+ individual choices someone can make and want to combine them innto the larger categories asian, black, white, hispanic/latino, etc.

data <- data.frame(race_se_asian = c(0,1,1,0,0,0,1,0,0,0,0),
                   race_south_asian = c(0,l,0,0,1,0,1,0,0,0),
                   race_european = c(1,0,1,0,0,0,0,0,0,0),
                   race_cuban = c(1,0,1,0,0,0,0,0,0,0),
                   race_mexican = c(1,0,1,0,0,0,0,0,0,0))

I've also created a 'total categories' column that counts the number of selections each individual has made.

Is there an easy way to do this with case_when or a for loop?

Upvotes: 0

Views: 35

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76460

Here is a way with an auxiliary function.

library(dplyr)

fun <- function(x, groups){
  i <- which(x == 1)
  if(length(i)){
    g <- groups[i, "groups"]
    g <- unique(g)
    if(length(g) == 1) g else "multiracial"
  } else NA_character_
}

groups <- c("asian", "asian", "european", "hispanic/latino", "hispanic/latino")
groups <- data.frame(ethnicity = names(data), groups)

data %>%
  rowwise() %>%
  mutate(category = fun(c_across(everything()), groups))

Upvotes: 1

Related Questions