user14212134
user14212134

Reputation:

Plotting multiple binary variables on the same plot in ggplot

I am hoping to use ggplot to construct a barplot of frequencies (or just % 1s) of a bunch of binary variables, and am having trouble getting them all together on one plot.

The variables all stem from the same question in a survey, so ideally it'd be nice to have data that is tidy with one column for this variable, but respondents could select more than one option and I'm hoping to retain that instead of having a "more than one selected" option. Here is a slice of the data:

structure(list(gender = structure(c("Male", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male"), label = "Q4", format.stata = "%24s"), 
    var1 = structure(c("0", "0", "1", "1", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var2 = structure(c("0", 
    "98", "1", "0", "0", "0", "0", "0", "0", "0"), format.stata = "%9s"), 
    var3 = structure(c("0", "0", "0", "0", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var4 = structure(c("1", 
    "0", "1", "0", "0", "0", "1", "1", "0", "0"), format.stata = "%9s"), 
    var5 = structure(c("1", "0", "0", "0", "0", "1", "0", "0", 
    "0", "0"), format.stata = "%9s")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 1234

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

Get the data in long format so that it is easier to plot.

library(tidyverse)

df %>%
  pivot_longer(cols = starts_with('var')) %>%
  group_by(name) %>%
  summarise(frequency_of_1 = sum(value == 1)) %>%
  #If you need percentage use mean instead of sum
  #summarise(frequency_of_1 = mean(value == 1)) %>%
  ggplot() + aes(name, frequency_of_1) + geom_col()

enter image description here


In base R you can do this with colSums and barplot.

barplot(colSums(df[-1] == 1))
#For percentage
#barplot(colMeans(df[-1] == 1))

Upvotes: 1

Related Questions