Reputation:
I am hoping to use ggplot to construct a barplot of frequencies (or just % 1s) of a bunch of binary variables, and am having trouble getting them all together on one plot.
The variables all stem from the same question in a survey, so ideally it'd be nice to have data that is tidy with one column for this variable, but respondents could select more than one option and I'm hoping to retain that instead of having a "more than one selected" option. Here is a slice of the data:
structure(list(gender = structure(c("Male", "Male", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male"), label = "Q4", format.stata = "%24s"),
var1 = structure(c("0", "0", "1", "1", "0", "0", "0", "0",
"0", "0"), format.stata = "%9s"), var2 = structure(c("0",
"98", "1", "0", "0", "0", "0", "0", "0", "0"), format.stata = "%9s"),
var3 = structure(c("0", "0", "0", "0", "0", "0", "0", "0",
"0", "0"), format.stata = "%9s"), var4 = structure(c("1",
"0", "1", "0", "0", "0", "1", "1", "0", "0"), format.stata = "%9s"),
var5 = structure(c("1", "0", "0", "0", "0", "1", "0", "0",
"0", "0"), format.stata = "%9s")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Views: 1234
Reputation: 389135
Get the data in long format so that it is easier to plot.
library(tidyverse)
df %>%
pivot_longer(cols = starts_with('var')) %>%
group_by(name) %>%
summarise(frequency_of_1 = sum(value == 1)) %>%
#If you need percentage use mean instead of sum
#summarise(frequency_of_1 = mean(value == 1)) %>%
ggplot() + aes(name, frequency_of_1) + geom_col()
In base R you can do this with colSums
and barplot
.
barplot(colSums(df[-1] == 1))
#For percentage
#barplot(colMeans(df[-1] == 1))
Upvotes: 1