Andrea
Andrea

Reputation: 110

Transformation of aestethic inputs in R and ggplot2

Is there a way to transform data in ggplot2 in the aes declaration of a geom?

I have a plot conceptually similar to this one:

test=data.frame("k"=rep(1:3,3),"ce"=rnorm(9),"comp"=as.factor(sort(rep(1:3,3))))
plot=ggplot(test,aes(y=ce,x=k))+geom_line(aes(lty=comp))

Suppose I would like to add a line calculated as the maximum of the values between the three comp for each k point, with only the plot object available. I have tried several options (e.g. using aggregate in the aes declaration, stat_function, etc.) but I could not find a way to make this work.

At the moment I am working around the problem by extracting the data frame with ggplot_build, but I would like to find a direct solution.

Upvotes: 0

Views: 128

Answers (3)

Artjom B.
Artjom B.

Reputation: 61952

Thanks to JLLagrange and jlhoward for your help. However both solutions require the access to the underlying data.frame, which I do not have. This is the workaround I am using, based on the previous example:

data=ggplot_build(plot)$data[[1]]
cemax=with(data,aggregate(y,by=list(x),max))
plot+geom_line(data=cemax,aes(x=Group.1,y=x),colour="green",alpha=.3,lwd=2)

This does not require direct access to the dataset, but to me it is a very inefficient and inelegant solution. Obviously if there is no other way to manipulate the data, I do not have much of a choice :)

Upvotes: 1

jlhoward
jlhoward

Reputation: 59425

EDIT (Response to OP's comment):

OK I see what you mean now - I should have read your question more carefully. You can achieve what you want using stat_summary(...), which does not require access to the original data frame. It also solves the problem I describe below (!!).

library(ggplot2)
set.seed(1)
test <- data.frame(k=rep(1:3,3),ce=rnorm(9),comp=factor(rep(1:3,each=3)))
plot <- ggplot(test,aes(y=ce,x=k))+geom_line(aes(lty=comp))
##
plot + stat_summary(fun.y=max, geom="line", col="red")

Original Response (Requires access to original df)

One unfortunate characteristic of ggplot is that aggregating functions (like max, min, mean, sd, sum, and so on) used in aes(...) operate on the whole dataset, not subgroups. So

plot + geom_line(aes(y=max(ce)))

will not work - you get the maximum of all test$ce vs. k, which is not useful.

One way around this, which is basically the same as @JLLagrange's answer (but doesn't use external libraries), is:

plot+geom_line(data=aggregate(ce~k,test,max),colour="red")

This creates a data frame dynamically aggregating ce by k using the max function.

Upvotes: 0

colcarroll
colcarroll

Reputation: 3682

Is

require(plyr)
max.line = ddply(test, .(k), summarise, ce = max(ce))
plot = ggplot(test, aes(y=ce,x=k))
plot = plot + geom_line(aes(lty=comp))
plot = plot + geom_line(data=max.line, color='red')

something like what you want?

Upvotes: 1

Related Questions