Larry V
Larry V

Reputation: 143

Create a new column with all selected column names in R

It's been a while since I've had to drop by and ask for advice, but this one has me stumped and my searches for an answer haven't been fruitful (probably because the words I'm using are confusing search engines).

I have a survey data file that has columns that are either set to a 1 (if the respondent clicked the box) or are NA (if they didn't). I want to create a new field for each respondent that summarizes the checked boxes.

Below is a reprex with the starting df and the desire df2 I would like to create.

Any advice is greatly appreciated. Happy Sunday! And thanks in advance.

Original table:

library(tibble)
set.seed(125)
df <- tibble(
  id = 1:10,
  math = sample(c(rep(1, 3), rep(NA, 7))),
  english = sample(c(rep(1, 5), rep(NA, 5))),
  science = sample(c(rep(1, 6), rep(NA, 4))),
  history = sample(c(rep(1, 8), rep(NA, 2)))
)

Looks like this:

# A tibble: 10 x 5
#      id  math english science history
#   <int> <dbl>   <dbl>   <dbl>   <dbl>
# 1     1    NA       1      NA       1
# 2     2    NA      NA       1       1
# 3     3    NA       1       1       1
# 4     4     1      NA       1       1
# 5     5     1       1      NA       1
# 6     6    NA       1       1       1
# 7     7    NA      NA      NA      NA
# 8     8     1      NA       1       1
# 9     9    NA       1      NA       1
#10    10    NA      NA       1      NA

Desired table

# A tibble: 10 x 2
#      id background                 
#   <int> <chr>                      
# 1     1 "english, history"         
# 2     2 "science, history"         
# 3     3 "english, science, history"
# 4     4 "math, science, history"   
# 5     5 "math, english, history"   
# 6     6 "english, science, history"
# 7     7 ""                         
# 8     8 "math, science, history"   
# 9     9 "english, history"         
#10    10 "science"  

Created on 2022-01-30 by the reprex package (v2.0.1)

Upvotes: 2

Views: 78

Answers (2)

Andre Wildberg
Andre Wildberg

Reputation: 19181

You can use pivot_longer and filter with a final paste to get the desired output

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols=-id) %>% 
  group_by(id) %>% 
  mutate(value=replace_na(value, 0), 
         v_sum=sum(value), 
         name=ifelse(v_sum==0,"",name)) %>% 
  filter(value==1|v_sum==0) %>% 
  summarise(background=paste(unique(name), collapse=", "))
# A tibble: 10 × 2
      id background                 
   <int> <chr>                      
 1     1 "english, history"         
 2     2 "science, history"         
 3     3 "english, science, history"
 4     4 "math, science, history"   
 5     5 "math, english, history"   
 6     6 "english, science, history"
 7     7 ""                         
 8     8 "math, science, history"   
 9     9 "english, history"         
10    10 "science"

Upvotes: 0

lroha
lroha

Reputation: 34586

A dplyr/purrr approach:

library(dplyr)
library(purrr)

df %>%
  transmute(id = id, 
            background = pmap_chr(across(-id, ~ cur_column()[.x]), ~ toString(na.omit(c(...)))))

# A tibble: 10 x 2
      id background                 
   <int> <chr>                      
 1     1 "english, history"         
 2     2 "science, history"         
 3     3 "english, science, history"
 4     4 "math, science, history"   
 5     5 "math, english, history"   
 6     6 "english, science, history"
 7     7 ""                         
 8     8 "math, science, history"   
 9     9 "english, history"         
10    10 "science"     

   

Upvotes: 2

Related Questions