Shawn
Shawn

Reputation: 3

How to use R for handling (pivoting?) social survey raw data?

We frequently ask scale questions in our social surveys; respondents provides their agreement with our statement (strongly agree, agree, neither nor, disagree, strongly disagree). The survey result usually comes in an aggregated format, i.e for each question(variable), the answers are provided in a single column, where 5=strongly agree, 1=strongly disagree etc.

enter image description here

Now we came across a new survey tool where answers were partitions into several columns for one question. For example Q1_1 column = Strongly agree for Q1, Q1_5 column = Strongly disagree. So for each question we received 5 columns of answers, if respondent answered Strongly Agree, Q1_1 related row is marked as 1, where Q1_2 - Q1_5 related row for that respondent are marked as 0.

enter image description here

Please can anyone kindly share a solution to 'aggregated' the answers from the new survey tool, so instead of having 5 columns for each question, we would have one column per question, with value 1-5.

I'm new to R, I thought R would handle this instead of having to manually change in Excel.

Upvotes: 0

Views: 139

Answers (2)

akrun
akrun

Reputation: 887048

We can use data.table methods

library(data.table)
dcast(unique(melt(setDT(df), id.var = 'Respondent')[,
  c('variable', 'answer') := tstrsplit(variable, '_', 
   type.convert = TRUE)][value == 1], by = "Respondent"), 
    Respondent  ~ variable, value.var = 'answer')

-output

#      Respondent Q6
# 1:  Respondent1  1
# 2: Respondent10  1
# 3:  Respondent2  2
# 4:  Respondent3  1
# 5:  Respondent4  1
# 6:  Respondent5  1
# 7:  Respondent6  1
# 8:  Respondent7  2
# 9:  Respondent8  2
#10:  Respondent9  4

data

df <- structure(list(Respondent = c("Respondent1", "Respondent2", "Respondent3", 
"Respondent4", "Respondent5", "Respondent6", "Respondent7", "Respondent8", 
"Respondent9", "Respondent10"), Q6_1 = c(1, 0, 1, 1, 1, 1, 0, 
0, 0, 1), Q6_2 = c(0, 1, 0, 0, 0, 0, 1, 1, 0, 1), Q6_3 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), Q6_4 = c(0, 0, 0, 0, 0, 0, 0, 0, 
1, 0)), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 0

Duck
Duck

Reputation: 39595

Try this approach reshaping and next time follow the advice from @r2evans as we have to type data. Here the code:

library(dplyr)
library(tidyr)
#Data
df <- data.frame(Respondent=paste0('Respondent',1:10),
                 Q6_1=c(1,0,1,1,1,1,0,0,0,1),
                 Q6_2=c(0,1,0,0,0,0,1,1,0,1),
                 Q6_3=rep(0,10),
                 Q6_4=c(rep(0,8),1,0),stringsAsFactors = F
                 )
#Code
new <- df %>% pivot_longer(-Respondent) %>%
  separate(name,c('variable','answer'),sep='_') %>%
  filter(value==1) %>%
  select(-value) %>%
  filter(!duplicated(Respondent)) %>%
  pivot_wider(names_from = variable,values_from=answer)

Output:

# A tibble: 10 x 2
   Respondent   Q6   
   <chr>        <chr>
 1 Respondent1  1    
 2 Respondent2  2    
 3 Respondent3  1    
 4 Respondent4  1    
 5 Respondent5  1    
 6 Respondent6  1    
 7 Respondent7  2    
 8 Respondent8  2    
 9 Respondent9  4    
10 Respondent10 1 

I only curious why your data in case of member 10 have two values of 1. Maybe a typo or is that possible?

Upvotes: 1

Related Questions