Reputation: 3
We frequently ask scale questions in our social surveys; respondents provides their agreement with our statement (strongly agree, agree, neither nor, disagree, strongly disagree). The survey result usually comes in an aggregated format, i.e for each question(variable), the answers are provided in a single column, where 5=strongly agree, 1=strongly disagree etc.
Now we came across a new survey tool where answers were partitions into several columns for one question. For example Q1_1 column = Strongly agree for Q1, Q1_5 column = Strongly disagree. So for each question we received 5 columns of answers, if respondent answered Strongly Agree, Q1_1 related row is marked as 1, where Q1_2 - Q1_5 related row for that respondent are marked as 0.
Please can anyone kindly share a solution to 'aggregated' the answers from the new survey tool, so instead of having 5 columns for each question, we would have one column per question, with value 1-5.
I'm new to R, I thought R would handle this instead of having to manually change in Excel.
Upvotes: 0
Views: 139
Reputation: 887048
We can use data.table
methods
library(data.table)
dcast(unique(melt(setDT(df), id.var = 'Respondent')[,
c('variable', 'answer') := tstrsplit(variable, '_',
type.convert = TRUE)][value == 1], by = "Respondent"),
Respondent ~ variable, value.var = 'answer')
-output
# Respondent Q6
# 1: Respondent1 1
# 2: Respondent10 1
# 3: Respondent2 2
# 4: Respondent3 1
# 5: Respondent4 1
# 6: Respondent5 1
# 7: Respondent6 1
# 8: Respondent7 2
# 9: Respondent8 2
#10: Respondent9 4
df <- structure(list(Respondent = c("Respondent1", "Respondent2", "Respondent3",
"Respondent4", "Respondent5", "Respondent6", "Respondent7", "Respondent8",
"Respondent9", "Respondent10"), Q6_1 = c(1, 0, 1, 1, 1, 1, 0,
0, 0, 1), Q6_2 = c(0, 1, 0, 0, 0, 0, 1, 1, 0, 1), Q6_3 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q6_4 = c(0, 0, 0, 0, 0, 0, 0, 0,
1, 0)), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 0
Reputation: 39595
Try this approach reshaping and next time follow the advice from @r2evans as we have to type data. Here the code:
library(dplyr)
library(tidyr)
#Data
df <- data.frame(Respondent=paste0('Respondent',1:10),
Q6_1=c(1,0,1,1,1,1,0,0,0,1),
Q6_2=c(0,1,0,0,0,0,1,1,0,1),
Q6_3=rep(0,10),
Q6_4=c(rep(0,8),1,0),stringsAsFactors = F
)
#Code
new <- df %>% pivot_longer(-Respondent) %>%
separate(name,c('variable','answer'),sep='_') %>%
filter(value==1) %>%
select(-value) %>%
filter(!duplicated(Respondent)) %>%
pivot_wider(names_from = variable,values_from=answer)
Output:
# A tibble: 10 x 2
Respondent Q6
<chr> <chr>
1 Respondent1 1
2 Respondent2 2
3 Respondent3 1
4 Respondent4 1
5 Respondent5 1
6 Respondent6 1
7 Respondent7 2
8 Respondent8 2
9 Respondent9 4
10 Respondent10 1
I only curious why your data in case of member 10 have two values of 1. Maybe a typo or is that possible?
Upvotes: 1