Reputation: 69
I am looking for a solution in [R] to make a chart like I show here (it is made in Excel):
I can make a histogram using below code:
ggplot(data=TestData, aes(x=
QP1)) + geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 0.1, fill = "lightblue", color="black")+ scale_x_continuous(breaks = seq(8,10,0.1)) + scale_y_continuous(labels = percent_format(), breaks = seq(0,1,0.05)) + xlab("QP1")
but I could not make a secondary axis and a line plot overlaid on the histogram. I found several example on this site asking similar question, but still had difficulty in truly understanding those solutions.
I need help for:
Thanks.
Edit: I could achieve what I wanted initially. During further enhancement, such as adding adding data labels is a challenge in this matter:
ggplot(data=TestData, aes(x=QP1, y=after_stat(count / sum(count)))) +
geom_histogram(fill = "darkorange", color="black", binwidth = 0.1) +
stat_bin(aes(y = after_stat(cumsum(count / sum(count)) * 0.5)),
geom = "line", colour = "dodgerblue",binwidth = 0.1) +
stat_bin(aes(label = after_stat(scales::percent(count / sum(count)))),
geom = "text",colour="blue", binwidth = 0.1,vjust=1) +
stat_bin(aes(label = after_stat(scales::percent(cumsum((count / sum(count)))))),
geom = "text",colour="blue", binwidth = 0.1, vjust=-4) +
scale_y_continuous(
labels = scales::percent, breaks = seq(0,5,.1),
name = "Proportion",
sec.axis = sec_axis(~ .x * 2,
name = "Cumulative Proportion",
labels = scales::percent, breaks = seq(0,1,.2)))
Data labels are added well and show correct numbers, cumulative labels need to be positioned as per sec.axis, how to do that? if we transform by add/div, label value changed not the position. Please suggest.
Upvotes: 1
Views: 1274
Reputation: 37903
So a couple of things about secondary axes:
trans
argument of the secondary axis.In the code below we achieve (1) by doing y = after_stat(cumsum(count / sum(count)) * 0.1
. The after_stat()
part replaces the older syntax of ..variable..
. The cumsum()
calculates the cumulative sum of the proportions, giving the cumulative proportions. The * 0.1
is dividing the cumulative data by 10 to achieve (1). Then, to achieve (2) you should give the secondary axis ~ .x * 10
to scale up the number on the axis itself. You can change these scaling factors depending on the plot, but be sure to change them at both places.
library(ggplot2)
df <- data.frame(
x = rnorm(100)
)
ggplot(df, aes(x)) +
geom_histogram(aes(y = after_stat(count / sum(count))),
fill = "darkorange") +
stat_bin(aes(y = after_stat(cumsum(count / sum(count)) * 0.1)),
geom = "line", colour = "dodgerblue") +
# Set secondary axis in y scale
scale_y_continuous(
labels = scales::percent,
name = "Proportion",
sec.axis = sec_axis(~ .x * 10,
name = "Cumulative Proportion",
labels = scales::percent)
) +
# For pretty colours
theme(
axis.line.y.left = element_line(colour = "darkorange"),
axis.text.y.left = element_text(colour = "darkorange"),
axis.ticks.y.left = element_line(colour = "darkorange"),
axis.title.y.left = element_text(colour = "darkorange"),
axis.line.y.right = element_line(colour = "dodgerblue"),
axis.text.y.right = element_text(colour = "dodgerblue"),
axis.ticks.y.right = element_line(colour = "dodgerblue"),
axis.title.y.right = element_text(colour = "dodgerblue")
)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2021-01-21 by the reprex package (v0.3.0)
EDIT:
With regards to sec_axis(~ .x * 10, ...)
, this is called 'lambda syntax' where you create a one-sided formula (only right hand side is defined), that will be converted to a function by rlang::as_function()
. The .x
is a placeholder for the input data, so the ~ .x * 10
can be read as function(x) {x * 10}
. This does not work in general, but many tidyverse packages accept this notation at various points.
The after_stat()
function is the newer notations of ..variable..
, such that after_stat(count/sum(count))
is the same as (..count..) / sum(..count..)
you use in your example. The difference is that you don't need to wrap every variable in ..
's and it is generally more flexible. The after_stat()
function causes whatever is inside that function to be evaluated after the stat layer has computed the stats. The count
variable is not an aesthetic you define, it is a computed variable that the stat layer produces, so we need after_stat()
to do something with that variable.
Upvotes: 1