Jorge Leitao
Jorge Leitao

Reputation: 20123

Bug in stat_summary with scale_y_log10 in ggplot?

I have a dataset that I'm aggregating and plotting:

d <- # set d to the database below.

agg <- aggregate(wt ~ t, data=d, FUN=mean)

# example 1
ggplot(agg, aes(x=t, y=wt)) + geom_line(size = 1.5)

# example 1log
ggplot(agg, aes(x=t, y=wt)) + geom_line(size = 1.5) + scale_y_log10()

# example 2
ggplot(d, aes(x=t, y=wt)) + stat_summary(fun.y="mean", geom="line", size = 1.5)

# example 2log
ggplot(d, aes(x=t, y=wt)) + stat_summary(fun.y="mean", geom="line", size = 1.5) + 
    scale_y_log10()

Example 2:

Example 2

Example 2log:

Example 2log

The problem is that even though example 1 and 2 are equal, example 1log and 2log are different and example 2log is even not consistent with example 2 at all.

Am I doing something wrong or this a bug?

I need to use the example 2log because I want to aggregate with different conditions, e.g.

ggplot(data, aes(x=t, y=wt)) +
  stat_summary(data=subset(data, dim == 6 & maxt == 32 & max_trials == 10000 & t > 2), fun.y="mean", geom="line", color="black", size = 1.5) + 
  stat_summary(data=subset(data, dim == 6 & maxt == 16 & max_trials == 1000 & t > 2), fun.y="mean", geom="line", color="black", size = 1.5) + scale_y_log10()

This is the dataset I'm using and that reproduces the error, as exported by write.table(d, "test.dat"):

"wt" "t"
"7" 12 3
"9" 18 4
"11" 28 6
"13" 14 7
"15" 81 9
"21" 97 10
"23" 3 11
"25" 12 12
"28" 46 13
"35" 1296 15
"37" 63 16
"39" 43 17
"41" 88 18
"43" 395 19
"45" 512 20
"47" 154 21
"49" 9 22
"51" 83 23
"53" 5 24
"55" 1606 25
"57" 3838 26
"59" 1331 27
"74" 23 3
"76" 20 4
"81" 79 5
"83" 32 6
"85" 14 7
"88" 24 8
"89" 9 9
"93" 67 10
"97" 44 11
"98" 18 12
"99" 101 13
"100" 17 14
"101" 19 16
"102" 41 18
"103" 9 19
"105" 26 20
"108" 76 21
"109" 2 22
"113" 883 23
"116" 2054 24
"137" 16 3
"139" 26 4
"140" 4 5
"144" 15 6
"145" 5 7
"150" 31 8
"155" 49 11
"168" 5700 12
"173" 12 3
"176" 40 6
"181" 89 7
"182" 2 8
"183" 4 9
"184" 5 10
"186" 35 11
"194" 357 12
"195" 13 13
"208" 2544 14
"209" 83 15
"210" 168 16
"211" 313 17
"212" 7 18
"213" 48 19
"214" 18 20
"215" 3 21
"216" 35 22
"230" 9 3
"233" 23 4
"235" 60 5
"236" 8 6
"237" 5 7
"238" 23 8
"239" 10 9
"240" 28 10
"241" 8 11
"242" 31 12
"244" 22 13
"245" 12 14
"246" 2 15
"247" 9 16
"261" 3475 17
"266" 1091 18
"267" 53 19
"268" 13 20
"269" 40 22
"270" 264 26
"271" 1726 27
"292" 43 3
"294" 22 4
"301" 48 5
"306" 81 6
"307" 5 7
"308" 25 8
"309" 12 9
"311" 12 10
"315" 63 13
"316" 2 14
"317" 30 15

Upvotes: 3

Views: 437

Answers (2)

aosmith
aosmith

Reputation: 36086

This has to do with when the transformation occurs when using transformations via scale_y_*. A helpful note is in the help page of coord_trans, which says:

The difference between transforming the scales and transforming the coordinate system is that scale transformation occurs BEFORE statistics, and coordinate transformation afterwards.

Because the transforming happens before the statistics you are calculating via stat_summary, your plot 2log is a plot of the mean of log10(wt) rather than mean(wt) on the log10 scale. You can verify this by calculating the mean of log10(wt) for each level of t before graphing.

agg2 <- aggregate(log(wt) ~ t, data=d, FUN=mean)

ggplot(agg2, aes(x=t, y=`log(wt)`)) + 
    geom_line(size = 1.5)

The shape of the line is the same as in 2log.

enter image description here

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

Very interesting problem. I think that the combination of stat_summary and the different y scale behaves suspiciously.

I've created a simple example:

library(ggplot2)

data = data.frame(t=c(1,1,10,10,30,30), wt = c(1,1,20,180,1200,1200))


ggplot(data, aes(x=t, y=wt)) +
  stat_summary(data=data, 
               fun.y="mean", geom="line", color="black", size = 1.5)+
  scale_y_log10()


d <- aggregate(wt ~ t, data=data, FUN=mean)

ggplot(d, aes(x=t, y=wt)) + geom_line(size = 1.5) + scale_y_log10()

The plots I get are:

enter image description here enter image description here

Also, if you run the above process without the scale_y_log10 you'll get exactly the same plots.

Upvotes: 0

Related Questions