Reputation: 317
I am trying to plot a histogram using ggplot2 with percentage on the y-axis and numerical values on the x-axis.
A sample of my data and script looks like this (below) and goes on for about 100,000 rows (or more).
A B
0.2 x
1 y
0.995 x
0.5 x
0.5 x
0.2 y
ggplot(data, aes(A, colour=B)) + geom_bar() +stat_bin(breaks=seq(0,1, by=0.05)) + scale_y_continuous(labels = percent)
I want to know the percentage of B values distributed in each bin of A value, instead of the number of B values per A value.
The code as it is now gives me a y-axis with ymax as 15000. The y-axis is supposed to be in percentages (0-100).
Upvotes: 0
Views: 2355
Reputation: 67778
Is this what you want? I assume your data frame is called df:
# calculate proportions of B for each level of A
df2 <- as.data.frame(with(df, prop.table(table(A, B))))
df2
# A B Freq
# 1 0.2 x 0.1666667
# 2 0.5 x 0.3333333
# 3 0.995 x 0.1666667
# 4 1 x 0.0000000
# 5 0.2 y 0.1666667
# 6 0.5 y 0.0000000
# 7 0.995 y 0.0000000
# 8 1 y 0.1666667
ggplot(data = df2, aes(x = A, y = Freq, fill = B)) +
geom_bar(stat = "identity", position = position_dodge())
Upvotes: 2