anelson
anelson

Reputation: 55

ggplot: why is the y-scale larger than the actual values for each response?

Likely a dumb question, but I cannot seem to find a solution: I am trying to graph a categorical variable on the x-axis (3 groups) and a continuous variable (% of 0 - 100) on the y-axis. When I do so, I have to clarify that the geom_bar is stat = "identity" or use the geom_col.

However, the values still show up at 4000 on the y-axis, even after following the comments from Y-scale issue in ggplot and from Why is the value of y bar larger than the actual range of y in stacked bar plot?.

Here is how the graph keeps coming out:

enter image description here

I also double checked that the x variable is a factor and the y variable is numeric. Why would this still be coming out at 4000 instead of 100, like a percentage?

EDIT: The y-values are simply responses from participants. I have a large dataset (N = 600) and the y-value are a percentage from 0-100 given by each participant. So, in each group (N = 200 per group), I have a value for the percentage. I wanted to visually compare the three groups based on the percentages they gave.

This is the code I used to plot the graph.

df$group <- as.factor(df$group)
df$confid<- as.numeric(df$confid)

library(ggplot2)                
plot <-ggplot(df, aes(group, confid))+
  geom_col()+ 
  ylab("confid %") + 
  xlab("group")

Upvotes: 0

Views: 1495

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173803

Are you perhaps trying to plot the mean percentage in each group? Otherwise, it is not clear how a bar plot could easily represent what you are looking for. You could perhaps add error bars to give an idea of the spread of responses.

Suppose your data looks like this:

set.seed(4)

df <- data.frame(group = factor(rep(1:3, each = 200)),
                 confid = sample(40, 600, TRUE))

Using your plotting code, we get very similar results to yours:

library(ggplot2)                
plot <-ggplot(df, aes(group, confid))+
  geom_col()+ 
  ylab("confid %") + 
  xlab("group")

plot

enter image description here

However, if we use stat_summary, we can instead plot the mean and standard error for each group:

ggplot(df, aes(group, confid)) +
  stat_summary(geom = "bar", fun = mean, width = 0.6, 
               fill = "deepskyblue", color = "gray50") +
  geom_errorbar(stat = "summary", width = 0.5) +
  geom_point(stat = "summary") +
  ylab("confid %") + 
  xlab("group")

enter image description here

Upvotes: 2

Related Questions