Reputation: 20123
I have a dataset that I'm aggregating and plotting:
d <- # set d to the database below.
agg <- aggregate(wt ~ t, data=d, FUN=mean)
# example 1
ggplot(agg, aes(x=t, y=wt)) + geom_line(size = 1.5)
# example 1log
ggplot(agg, aes(x=t, y=wt)) + geom_line(size = 1.5) + scale_y_log10()
# example 2
ggplot(d, aes(x=t, y=wt)) + stat_summary(fun.y="mean", geom="line", size = 1.5)
# example 2log
ggplot(d, aes(x=t, y=wt)) + stat_summary(fun.y="mean", geom="line", size = 1.5) +
scale_y_log10()
Example 2:
Example 2log:
The problem is that even though example 1 and 2 are equal, example 1log and 2log are different and example 2log is even not consistent with example 2 at all.
Am I doing something wrong or this a bug?
I need to use the example 2log because I want to aggregate with different conditions, e.g.
ggplot(data, aes(x=t, y=wt)) +
stat_summary(data=subset(data, dim == 6 & maxt == 32 & max_trials == 10000 & t > 2), fun.y="mean", geom="line", color="black", size = 1.5) +
stat_summary(data=subset(data, dim == 6 & maxt == 16 & max_trials == 1000 & t > 2), fun.y="mean", geom="line", color="black", size = 1.5) + scale_y_log10()
This is the dataset I'm using and that reproduces the error, as exported by write.table(d, "test.dat")
:
"wt" "t"
"7" 12 3
"9" 18 4
"11" 28 6
"13" 14 7
"15" 81 9
"21" 97 10
"23" 3 11
"25" 12 12
"28" 46 13
"35" 1296 15
"37" 63 16
"39" 43 17
"41" 88 18
"43" 395 19
"45" 512 20
"47" 154 21
"49" 9 22
"51" 83 23
"53" 5 24
"55" 1606 25
"57" 3838 26
"59" 1331 27
"74" 23 3
"76" 20 4
"81" 79 5
"83" 32 6
"85" 14 7
"88" 24 8
"89" 9 9
"93" 67 10
"97" 44 11
"98" 18 12
"99" 101 13
"100" 17 14
"101" 19 16
"102" 41 18
"103" 9 19
"105" 26 20
"108" 76 21
"109" 2 22
"113" 883 23
"116" 2054 24
"137" 16 3
"139" 26 4
"140" 4 5
"144" 15 6
"145" 5 7
"150" 31 8
"155" 49 11
"168" 5700 12
"173" 12 3
"176" 40 6
"181" 89 7
"182" 2 8
"183" 4 9
"184" 5 10
"186" 35 11
"194" 357 12
"195" 13 13
"208" 2544 14
"209" 83 15
"210" 168 16
"211" 313 17
"212" 7 18
"213" 48 19
"214" 18 20
"215" 3 21
"216" 35 22
"230" 9 3
"233" 23 4
"235" 60 5
"236" 8 6
"237" 5 7
"238" 23 8
"239" 10 9
"240" 28 10
"241" 8 11
"242" 31 12
"244" 22 13
"245" 12 14
"246" 2 15
"247" 9 16
"261" 3475 17
"266" 1091 18
"267" 53 19
"268" 13 20
"269" 40 22
"270" 264 26
"271" 1726 27
"292" 43 3
"294" 22 4
"301" 48 5
"306" 81 6
"307" 5 7
"308" 25 8
"309" 12 9
"311" 12 10
"315" 63 13
"316" 2 14
"317" 30 15
Upvotes: 3
Views: 437
Reputation: 36086
This has to do with when the transformation occurs when using transformations via scale_y_*
. A helpful note is in the help page of coord_trans
, which says:
The difference between transforming the scales and transforming the coordinate system is that scale transformation occurs BEFORE statistics, and coordinate transformation afterwards.
Because the transforming happens before the statistics you are calculating via stat_summary
, your plot 2log
is a plot of the mean of log10(wt)
rather than mean(wt)
on the log10 scale. You can verify this by calculating the mean of log10(wt)
for each level of t
before graphing.
agg2 <- aggregate(log(wt) ~ t, data=d, FUN=mean)
ggplot(agg2, aes(x=t, y=`log(wt)`)) +
geom_line(size = 1.5)
The shape of the line is the same as in 2log
.
Upvotes: 1
Reputation: 16121
Very interesting problem. I think that the combination of stat_summary
and the different y scale behaves suspiciously.
I've created a simple example:
library(ggplot2)
data = data.frame(t=c(1,1,10,10,30,30), wt = c(1,1,20,180,1200,1200))
ggplot(data, aes(x=t, y=wt)) +
stat_summary(data=data,
fun.y="mean", geom="line", color="black", size = 1.5)+
scale_y_log10()
d <- aggregate(wt ~ t, data=data, FUN=mean)
ggplot(d, aes(x=t, y=wt)) + geom_line(size = 1.5) + scale_y_log10()
The plots I get are:
Also, if you run the above process without the scale_y_log10
you'll get exactly the same plots.
Upvotes: 0