Reputation: 553
I want to find which city has the highest proportion of having kids (yes/no).
> dput(df)
structure(list(City = c("Manhattan", "Los Angeles", "Manhattan",
"Boston", "Dallas", "Los Angeles", "Dallas", "Los Angeles", "Dallas",
"Manhattan", "Boston", "Manhattan"), Has_Kids = c(0L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-12L))
Right now I have the code to find the mean, but I would like to also add error bars to see any significant:
df %>%
group_by(City) %>%
dplyr::summarise(`Kids Percent` = 100 * mean(Has_Kids == 1)) %>%
ggplot(aes(x = City, y = `Kids Percent`, fill = City)) +
geom_text(
aes(label = round(`Kids Percent`, 2)),
vjust = -0.3,
size = 2.5,
na.rm = TRUE
) +
geom_bar(stat = "identity", na.rm = TRUE) +
theme_bw() +
labs(title = "Kids by City [Proportion]",
x = "City", y = "%") + theme(axis.text.x = element_text(
angle = 90,
vjust = 0.5,
hjust = 1
))
Edit: i'm also open to other/potentially better to visualize these data. My real dataset is similar in nature, but I have around 200k rows. Please suggest any better visualization methods if you know of any.
Upvotes: 0
Views: 322
Reputation: 160
I don't have a high enough reputation to comment, so I'm forced to "answer".
These sort of plots (sometimes called dynamite plots because they look like a cartoonish stick of dynamite with a wick poking out) are not regarded particularly highly because they do not communicate the structure of the data very effectively.
Consider Dynamite Plots Mist Die, which contains some alternatives (in ggplot).
Upvotes: 1