lmcshane
lmcshane

Reputation: 1114

R multiple choice questionnaire data to ggplot

I have a Qualtrics multiple choice question that I want to use to create graphs in R. My data is organized so that you can answer multiple answers for each question. For example, participant 1 selected multiple choice answers 1 (Q1_1) & 3 (Q1_3). I want to collapse all answer choices in one bar graph, one bar for each multiple response option (Q1_1:Q1_3) divided by the number of respondents who answered this question (in this case, 3).

df <- structure(list(Participant = 1:3, A = c("a", "a", ""), B = c("", "b", "b"), C = c("c", "c", "c")), .Names = c("Participant", "Q1_1", "Q1_2", "Q1_3"), row.names = c(NA, -3L), class = "data.frame")

I want to use ggplot2 and maybe some sort of loop through Q1_1: Q1_3?

Upvotes: 0

Views: 3152

Answers (3)

Sandipan Dey
Sandipan Dey

Reputation: 23109

I think you want something like this (proportion with a stacked bar chart):

  Participant Q1_1 Q1_2 Q1_3
1           1    a         c
2           2    a    a    c
3           3    c    b    c
4           4         b    d

# ensure that all question columns have the same factor levels, ignore blanks
for (i in 2:4) {
   df[,i] <- factor(df[,i], levels = c(letters[1:4]))
}

tdf <- as.data.frame(sapply(df[2:4], function(x)table(x)/sum(table(x))))
tdf$choice <- rownames(tdf)
tdf <- melt(tdf, id='choice')

ggplot(tdf, aes(variable, value, fill=choice)) + 
       geom_bar(stat='identity') + 
       xlab('Questions') + 
       ylab('Proportion of Choice')

enter image description here

Upvotes: 0

bVa
bVa

Reputation: 3938

Here is a solution using ddply from dplyr package.

# I needed to increase number of participants to ensure it works in every case
df = data.frame(Participant = seq(1:100), 
Q1_1 = sample(c("a", ""), 100, replace = T, prob = c(1/2, 1/2)), 
Q1_2 = sample(c("b", ""), 100, replace = T, prob = c(2/3, 1/3)), 
Q1_3 = sample(c("c", ""), 100, replace = T, prob = c(1/3, 2/3)))
df$answer = paste0(df$Q1_1, df$Q1_2, df$Q1_3)

summ = ddply(df, c("answer"), summarize, freq = length(answer)/nrow(df))

## Re-ordeing of factor levels summ$answer
summ$answer <- factor(summ$answer, levels=c("", "a", "b", "c", "ab", "ac", "bc", "abc"))

# Plot 
ggplot(summ, aes(answer, freq, fill = answer)) + geom_bar(stat = "identity") + theme_bw() 

enter image description here

Note : it might be more complicated if you have more columns relating to other questions ("Q2_1", "Q2_2"...). In this case, melting data for each question could be a solution.

Upvotes: 0

csiu
csiu

Reputation: 3269

Perhaps this is what you want

f <- 
  structure(
    list(
      Participant = 1:3,
      A = c("a", "a", ""),
      B = c("", "b", "b"),
      C = c("c", "c", "c")),
    .Names = c("Participant", "Q1_1", "Q1_2", "Q1_3"),
    row.names = c(NA, -3L),
    class = "data.frame"
  )


library(tidyr)
library(dplyr)
library(ggplot2)

nparticipant <- nrow(f)
f %>% 
  ## Reformat the data
  gather(question, response, starts_with("Q")) %>%
  filter(response != "") %>%

  ## calculate the height of the bars
  group_by(question) %>%
  summarise(score = length(response)/nparticipant) %>%

  ## Plot
  ggplot(aes(x=question, y=score)) +
  geom_bar(stat = "identity")

enter image description here

Upvotes: 3

Related Questions