Kayci Muirbrook
Kayci Muirbrook

Reputation: 11

Concatenating partially duplicate responses from multiple choice in long data in R

I am rather new to R, and I have been trying to write a code that will find and concatenate multiple choice question responses when the data is in long format. The data needs to be pivoted wide, but cannot without resolving the duplicate IDs that result from these multiple choice responses. I want to combine the extra multiple choice response to the distinct ID number, so that it would look like: "affiliation 1, affiliation 2" for the individual respondent, in long format. I would prefer to not use row numbers, as the data is recollected on a monthly basis and row numbers may not stay constant. I need to identify the duplicate ID due to the multiple choice question, and attach its secondary answer to the other response.

I have tried various versions of aggregate, grouping and summarizing, filter, unique, and distinct, but haven't been able to solve the problem.

Here is an example of the data:

ID  Question       Response
1   question 1     affiliation x
1   question 2     course 1 
2   question 1     affiliation y
2   question 2     course 1
3   question 1     affiliation x  
3   question 1     affiliation z    
4   question 1     affiliation y

I want the data to look like this:

ID  Question       Response Text
1   question 1     affiliation x
1   question 2     course 1 
2   question 1     affiliation y
2   question 2     course 1
3   question 1     affiliation x, affiliation z      
4   question 1     affiliation y

so that it is prepared for pivot_wider.

Some example code that I've tried:

library(tidyverse)
course1 <- all_surveys %>%
  filter(`Survey Title`=="course 1") %>%
  aggregate("ID" ~ "Response Text", by(`User ID`, Question), FUN=sum) %>%
  pivot_wider(id_cols = c("ID", `Response Date`), 
              names_from = "Question", 
              values_from = "Response Text") %>%
select([questions to be retained from Question])

I have also tried

  group_by(question_new, `User ID`) %>%
  summarize(text = str_c("Response Text", collapse = ", "))

as well as

aggregate(c[("Response Text" ~ "question_new")], 
   by = list(`User ID` = `User ID`, `Response Date` = `Response Date`), 
   function(x) unique(na.omit(x)))

and a bunch of different iterations of the above. Thank you very much, in advance!

Upvotes: 1

Views: 114

Answers (1)

GuedesBF
GuedesBF

Reputation: 9858

We can try to pivot_wider using values_fn = toString:

df %>% pivot_wider(names_from = Question,
                   values_from = response,
                   values_fn = toString)

small minimal example

df<-tibble(ID = c(1,1,2,2), Question = c('question 1', 'question 2', 'question 1', 'question 1'), response = c('affiliation x', 'course 1', 'affiliation x', 'affiliation y'))

# A tibble: 4 × 3
     ID Question   response     
  <dbl> <chr>      <chr>        
1     1 question 1 affiliation x
2     1 question 2 course 1     
3     2 question 1 affiliation x
4     2 question 1 affiliation y

output

# A tibble: 2 × 3
     ID `question 1`                 `question 2`
  <dbl> <chr>                        <chr>       
1     1 affiliation x                course 1    
2     2 affiliation x, affiliation y NA        

Upvotes: 2

Related Questions