Dr. Fabian Habersack
Dr. Fabian Habersack

Reputation: 1141

How to plot multiple response survey items in R using ggplot2?

I have survey data structured into several item variables that denote whether something was mentioned (1) or not mentioned (2) by a survey respondent. So in short, each row is for a different survey respondent and they can either choose all options a through c (as is the case for the third respondent in the data below), or none or just some.

Let this be the dataset:

testdat<-data.frame(option_a=c(1,2,2,1,2),
                    option_b=c(1,1,2,1,2),
                    option_c=c(1,1,2,1,1))

What would be the easiest and fastest way to plot just the relative frequencies of how often any option was chosen? The outcome should be a geom_bar plot with three bars representing the different options (a: 40%, b: 60%, c: 20%). Put differently, I would like to have a plot based on which I could say, a given option was chosen in x% of the cases by the respondents.

Is there a way by which I could do this directly in ggplot without having to restructure the dataset or replace 2s by 0s, etc.? I guess this should be fairly easy, but I just can't see it right now.

Upvotes: 2

Views: 2141

Answers (1)

Wietze314
Wietze314

Reputation: 6020

For a barplot you need to make your data into a long format. You cannot do that within the ggplot function itself. You can change the levels of the values within ggplot, nut you will also need to rename the fill legend.

testdat<-data.frame(option_a=c(1,2,2,1,2),
                    option_b=c(1,1,2,1,2),
                    option_c=c(1,1,2,1,1))

require(ggplot2)
require(tidyverse)

testdat %>%
  gather(option,value) %>%
  ggplot(aes(x = factor(option), fill = factor((value-2)*-1))) +
  geom_bar()

to get the percentages/proportions instead of n you can summarise the data before plotting the data like so:

testdat %>%
  gather(option, value) %>%
  group_by(option,value) %>%
  summarise(n = n()) %>%
  group_by(option) %>%
  mutate(percentage = n/sum(n)*100) %>%
  ggplot(aes(x = factor(option), y = percentage, fill = factor((value-2)*-1))) +
  geom_bar(stat = "identity")

EDIT:

only show the relative frequencies of one of the options:

testdat %>%
  gather(option, value) %>%
  group_by(option,value) %>%
  summarise(n = n()) %>%
  group_by(option) %>%
  mutate(percentage = n/sum(n)*100) %>%
  filter(value == 1) %>%
  ggplot(aes(x = factor(option), y = percentage, fill = factor((value-2)*-1))) +
  geom_bar(stat = "identity")

Upvotes: 2

Related Questions