Reputation: 2945
I understand the different between a scale
and coord
transform in ggplot2
is that scale transforms are done before the statistic is computed and coordinate transforms are done after the statistic is computed. However, I'm having trouble understanding this with an actual example.
library(ggplot2)
library(gapminder)
base <- ggplot(data = gapminder,
mapping = aes(x = year,
y = gdpPercap * pop,
color = continent)) +
geom_line(stat = "summary")
base +
scale_y_continuous(trans = "log10")
base +
coord_trans(y = "log10")
The coord_trans()
results in the correct depiction of the data, but I do not understand why.
Note: I have seen this question and it did not fully help
Upvotes: 1
Views: 638
Reputation: 66880
Here's a simpler example that should help explain the differences. Suppose we have two values in a data frame, 1 and 10. The mean of these is 11 / 2 = 5.5.
my_data = data.frame(y = c(1, 10))
mean(my_data$y)
#[1] 5.5
If we take the log (base 10) of those, we get 0 and 1. The average of the logs is (0+1)/2 = 0.5. If we transform that back to the original scale, we get 10^0.5 = 3.162. So we can see that ten to the mean of the logs is not the same as the mean; the log "squishes" the large values so they have less of an impact on the average.
log10(my_data$y)
#[1] 0 1
mean(log10(my_data$y))
#[1] 0.5
10^mean(log10(my_data$y))
#[1] 3.162278
We'll see the same thing if we plot this. Using a coord transformation will control the viewport and the spatial position of the data points (e.g. note that the vertical height in pixels between 5.00 to 5.25 is a smidge bigger than the distance from 5.75 to 6.00, due to the log scale), but it doesn't change the data points -- we still get an average of 5.5:
ggplot(my_data, aes(y = y, x = 1)) +
geom_point(stat = "summary", fun = "mean") +
coord_trans(y = "log10")
But if we switch to scale_y_log10
, the transformation is applied upstream of the mean calculation, so the value we get is ten to the mean of the logs, which we saw is not the same as the arithmetic mean.
ggplot(my_data, aes(y = y, x = 1)) +
geom_point(stat = "summary", fun = "mean") +
scale_y_log10()
Upvotes: 2