Reputation: 11
I am rather new to R, and I have been trying to write a code that will find and concatenate multiple choice question responses when the data is in long format. The data needs to be pivoted wide, but cannot without resolving the duplicate IDs that result from these multiple choice responses. I want to combine the extra multiple choice response to the distinct ID number, so that it would look like: "affiliation 1, affiliation 2"
for the individual respondent, in long format. I would prefer to not use row numbers, as the data is recollected on a monthly basis and row numbers may not stay constant. I need to identify the duplicate ID due to the multiple choice question, and attach its secondary answer to the other response.
I have tried various versions of aggregate, grouping and summarizing, filter, unique, and distinct, but haven't been able to solve the problem.
Here is an example of the data:
ID Question Response
1 question 1 affiliation x
1 question 2 course 1
2 question 1 affiliation y
2 question 2 course 1
3 question 1 affiliation x
3 question 1 affiliation z
4 question 1 affiliation y
I want the data to look like this:
ID Question Response Text
1 question 1 affiliation x
1 question 2 course 1
2 question 1 affiliation y
2 question 2 course 1
3 question 1 affiliation x, affiliation z
4 question 1 affiliation y
so that it is prepared for pivot_wider.
Some example code that I've tried:
library(tidyverse)
course1 <- all_surveys %>%
filter(`Survey Title`=="course 1") %>%
aggregate("ID" ~ "Response Text", by(`User ID`, Question), FUN=sum) %>%
pivot_wider(id_cols = c("ID", `Response Date`),
names_from = "Question",
values_from = "Response Text") %>%
select([questions to be retained from Question])
I have also tried
group_by(question_new, `User ID`) %>%
summarize(text = str_c("Response Text", collapse = ", "))
as well as
aggregate(c[("Response Text" ~ "question_new")],
by = list(`User ID` = `User ID`, `Response Date` = `Response Date`),
function(x) unique(na.omit(x)))
and a bunch of different iterations of the above. Thank you very much, in advance!
Upvotes: 1
Views: 114
Reputation: 9858
We can try to pivot_wider
using values_fn = toString
:
df %>% pivot_wider(names_from = Question,
values_from = response,
values_fn = toString)
small minimal example
df<-tibble(ID = c(1,1,2,2), Question = c('question 1', 'question 2', 'question 1', 'question 1'), response = c('affiliation x', 'course 1', 'affiliation x', 'affiliation y'))
# A tibble: 4 × 3
ID Question response
<dbl> <chr> <chr>
1 1 question 1 affiliation x
2 1 question 2 course 1
3 2 question 1 affiliation x
4 2 question 1 affiliation y
output
# A tibble: 2 × 3
ID `question 1` `question 2`
<dbl> <chr> <chr>
1 1 affiliation x course 1
2 2 affiliation x, affiliation y NA
Upvotes: 2